首页 > 代码库 > 跟我一起做面试题-linux线程编程(7)

跟我一起做面试题-linux线程编程(7)

一直以来难以调试多线程,在网上搜索得知一种多线程调试的方法

一直觉得Linux下的多线程调试是很麻烦的,因为一般大一点的程序线程会很多,通过gdb的info thread命令看全都是系统调用,看不到详细的方法,至少我看到是这样的...如果用thread id跟进每个thread去bt,是件相当痛苦的事情,特别是你info thread看到近百个线程的时候T_T.而且大多时候等待重现问题或重启程序的时间代价是相当高的,在程序运行的情况下查看thread堆栈情况就显得 很重要了.

其实有pstack这个命令问题就很简单了,pstack具体用法man pstack.

gdb+pstack:

通过ps获取程序pid.

gdb processname pid,调试在运行程序.

info thread输出的线程信息类似:

(gdb) info thread
79 Thread 0x2547b90 (LWP 22267)? 0x00795410 in __kernel_vsyscall ()?
78 Thread 0x5c6eb90 (LWP 22268)? 0x00795410 in __kernel_vsyscall ()?
77 Thread 0x2f48b90 (LWP 22270)? 0x00795410 in __kernel_vsyscall ()?
76 Thread 0x94e3b90 (LWP 22272)? 0x00795410 in __kernel_vsyscall ()?
75 Thread 0x3949b90 (LWP 22287)? 0x00795410 in __kernel_vsyscall ()?
74 Thread 0x3d4ab90 (LWP 22393)? 0x00795410 in __kernel_vsyscall ()?
73 Thread 0x414bb90 (LWP 22394)? 0x00795410 in __kernel_vsyscall ()

从上面来看,只能看到系统调用,这很难判断各个线程在干什么,如果你有别的方法能在gdb中直观的查看到各个线程在跑啥,求告知 : )

我的办法是借助pstack来查看,当然,前提是你的Linux上装了这个工具,但大部分Linux上应该是有的.

pstack pid,你会得到很多信息:

pstack 22266
Thread 78 (Thread 0x2547b90 (LWP 22267)):
#0? 0x00795410 in __kernel_vsyscall ()
#1? 0x00b5eaa6 in nanosleep () from /lib/libc.so.6
#2? 0x00b5e8cf in sleep () from /lib/libc.so.6
#3? 0x08134a3a in RemoveLog(void*) ()
#4? 0x0085e832 in start_thread () from /lib/libpthread.so.0
#5? 0x00b9ee0e in clone () from /lib/libc.so.6
Thread 77 (Thread 0x5c6eb90 (LWP 22268)):
#0? 0x00795410 in __kernel_vsyscall ()
#1? 0x00b5eaa6 in nanosleep () from /lib/libc.so.6
#2? 0x00b5e8cf in sleep () from /lib/libc.so.6
#3? 0x08134a3a in RemoveLog(void*) ()
#4? 0x0085e832 in start_thread () from /lib/libpthread.so.0
#5? 0x00b9ee0e in clone () from /lib/libc.so.6

例如我看到下面这个线程,其他线程的CProcess::readDevices都被lock住了,因为线程安全CProcess::readDevices是加锁的,其他很多线程都是卡在CProcess::wrLock() ()或CProcess::rdLock() (),而下面这个线程明显是罪魁祸首,别问我为什么,因为我无法解释1+1为什么=2...

Thread 23 (Thread 0x631ffb90 (LWP 1161)):
#0? 0x00795410 in __kernel_vsyscall ()
#1? 0x00b979d1 in select () from /lib/libc.so.6
#2? 0x0812f1a2 in Sleep(int) ()
#3? 0x080598e6 in CProcess::readDevices(FluxControl::DeviceArray const&) ()
#4? 0x081281a1 in FluxControlImpl::readDevices(FluxControl::DeviceArray const&) ()
#5? 0x0813b2f2 in _0RL_lcfn_28D700E3085A57CA_30000000(omniCallDescriptor*, omniServant*) ()
#6? 0x0093b14d in omniCallHandle::upcall(omniServant*, omniCallDescriptor&) () from /export/home/tools/omniORB-4.0.7/lib/libomniORB4.so.0
#7? 0x0813c48e in FluxControl::_impl_FluxDeviceConf::_dispatch(omniCallHandle&) ()
#8? 0x009269bf in omni::omniOrbPOA::dispatch(omniCallHandle&, omniLocalIdentity*) () from /export/home/tools/omniORB-4.0.7/lib/libomniORB4.so.0
#9? 0x0090a963 in omniLocalIdentity::dispatch(omniCallHandle&) () from /export/home/tools/omniORB-4.0.7/lib/libomniORB4.so.0
#10 0x0095aa92 in omni::GIOP_S::handleRequest() () from /export/home/tools/omniORB-4.0.7/lib/libomniORB4.so.0

找到这里在pstack中无法关联到具体的代码,这个时候回头看gdb,gdb是可以看到的 : ).

上面pstack得到的问题线程是Thread 23 (Thread 0x631ffb90 (LWP 1161)),但这时你用gdb查看程序pid的时候线程号不一定还是thread 23,以为其他没有被锁的线程也会在竞争,序号是会变的,但是Thread 0x631ffb90 是不会变的,所以gdb processname pid,然后info thread查看线程,然后在输出的线程信息中搜索0x631ffb90,你就会找到对应的线程,然后在gdb中执行thread threadid,然后就可以进入这个线程查看了.

 

跟我一起做面试题-linux线程编程(7)