[Linux-cluster] problem with deadlocked processes (D)

Peter Sopko Peter_Sopko at tempest.sk
Wed Apr 4 13:18:18 UTC 2007


Hi, 

thanks for your reply Bryn.

The output of the ps command you suggested (i've ommited the standard system
processes) :

[root at mail1 subsys]# ps ax -ocomm,pid,state,wchan |more
COMMAND            PID S WCHAN
ccsd              2258 S -
cman_comms        2310 S cluster_kthread
cman_serviced     2312 S serviced
cman_memb         2311 S membership_kthread
cman_hbeat        2315 S hello_kthread
fenced            2336 S rt_sigsuspend
dlm_astd          2354 S dlm_astd
dlm_recvd         2355 S dlm_recvd
dlm_sendd         2356 S dlm_sendd
lock_dlm1         2358 S dlm_async
lock_dlm2         2359 S dlm_async
gfs_scand         2360 S -
gfs_glockd        2361 S gfs_glockd
gfs_recoverd      2362 S -
gfs_logd          2363 S -
gfs_quotad        2364 D glock_wait_internal
gfs_inoded        2365 D dlm_lock_sync
syslogd           2374 S -
klogd             2394 S syslog
heartbeat         2503 S -
courierlogger     2526 S pipe_wait
authdaemond       2527 S -
authdaemond       2551 S -
authdaemond       2552 S -
authdaemond       2553 S -
authdaemond       2554 S -
authdaemond       2555 S -
heartbeat         2586 S pipe_wait
heartbeat         2587 S -
heartbeat         2588 S -
heartbeat         2589 S -
heartbeat         2590 S -
acpid             2595 S -
ipfail            2608 S -
nod32d            2609 S -
nod32smtp         2618 S -
sshd              2627 S -
ntpd              2642 S -
courierlogger     2654 S pipe_wait
couriertcpd       2655 S -
courierlogger     2661 S pipe_wait
couriertcpd       2662 S -
courierlogger     2667 S pipe_wait
couriertcpd       2668 S wait
courierlogger     2673 S pipe_wait
couriertcpd       2674 S -
master            2815 S -
master            3024 S -
httpd             3039 S -
crond             3048 S -
rhnsd             3067 S -
mingetty          3074 S -
mingetty          3075 S -
mingetty          3076 S -
mingetty          3077 S -
mingetty          3078 S -
mingetty          3079 S -
ntpd              3888 S rt_sigsuspend
tlsmgr            4544 S -
tlsmgr            1585 S -
anvil             1699 S -
spamd            29941 S -
httpd            15674 D glock_wait_internal
httpd            15675 D glock_wait_internal
httpd            15676 D glock_wait_internal
httpd            15677 D glock_wait_internal
httpd            15678 D glock_wait_internal
httpd            15679 D glock_wait_internal
httpd            15680 D glock_wait_internal
httpd            15681 D glock_wait_internal
httpd            30808 D glock_wait_internal
httpd            30809 D glock_wait_internal
httpd            30810 D glock_wait_internal
httpd            30825 D glock_wait_internal
httpd            30827 D glock_wait_internal
httpd            30828 D glock_wait_internal
httpd            30829 D glock_wait_internal
httpd            30830 D glock_wait_internal
httpd            30831 D glock_wait_internal
httpd            30832 D glock_wait_internal
httpd            30835 D glock_wait_internal
httpd            30840 D glock_wait_internal
spamd            17341 S -
proxymap         24868 S -
proxymap         27542 S -
mysqld_safe      30617 S wait
mysqld           30650 S -
trivial-rewrite  30735 S -
proxymap         30742 S -
sshd               517 S -
sshd               519 S -
bash               520 S wait
su                 740 S wait
bash               741 S -
imapd            15018 D lock_on_glock
virtual          15699 D lock_on_glock
trivial-rewrite  15918 S -
proxymap         15922 S -
virtual          15943 D lock_on_glock
virtual          15952 D lock_on_glock
virtual          15965 D lock_on_glock
pop3d            15966 D lock_on_glock
pop3d            15967 D lock_on_glock
virtual          15968 D lock_on_glock
pop3d            15971 D lock_on_glock
pop3d            15983 D lock_on_glock
virtual          16046 D lock_on_glock
pop3d            16049 D lock_on_glock
pop3d            16053 D lock_on_glock
pop3d            16068 D glock_wait_internal
pop3d            16074 D glock_wait_internal
virtual          16077 D lock_on_glock
spamd            16112 S -
virtual          16129 D lock_on_glock
virtual          16133 D lock_on_glock
pop3d            16143 D glock_wait_internal
virtual          16153 D lock_on_glock
virtual          16160 D glock_wait_internal
virtual          16163 D lock_on_glock
pop3d            16164 D glock_wait_internal
virtual          16179 D lock_on_glock
pop3d            16183 D glock_wait_internal
pop3d            16186 D glock_wait_internal
pop3d            16187 D glock_wait_internal
virtual          16191 D lock_on_glock
pop3d            16192 D lock_on_glock
virtual          16194 D lock_on_glock
pop3d            16202 D glock_wait_internal
virtual          16207 D lock_on_glock
virtual          16217 D lock_on_glock
virtual          16222 D lock_on_glock
.... 
smtp             21150 S -
smtp             21162 S flock_lock_file_wait
cleanup          21181 S flock_lock_file_wait
smtpd            21213 S -
spamfilter.sh    21224 S wait
cat              21225 S pipe_wait
spamfilter.sh    21226 D -
spamfilter.sh    21229 S wait
pipe             21230 S -
cat              21231 S pipe_wait
spamfilter.sh    21232 D -
spamfilter.sh    21235 S wait
cat              21236 S pipe_wait
spamfilter.sh    21237 D -
spamfilter.sh    21239 S wait
spamfilter.sh    21240 S wait
cat              21242 S pipe_wait
spamfilter.sh    21243 D -
virtual          21244 D lock_on_glock
cat              21245 S pipe_wait
spamfilter.sh    21246 D -
spamfilter.sh    21249 S wait
cat              21250 S pipe_wait
spamfilter.sh    21251 D -
spamfilter.sh    21252 S wait
cat              21253 S pipe_wait
spamfilter.sh    21254 D -
spamfilter.sh    21257 S wait
cat              21258 S pipe_wait
spamfilter.sh    21259 D -
spamfilter.sh    21261 S wait
spamfilter.sh    21262 S wait
spamfilter.sh    21263 S wait
cat              21264 S pipe_wait
spamfilter.sh    21265 D -
spamfilter.sh    21267 D -
cat              21268 S pipe_wait
spamfilter.sh    21269 D -
spamfilter.sh    21273 S wait
...
etc....


The sysrq-t output is to be found on this url -
http://www.backbone.sk/sysrq.tar. It's 400k in size, so I have chosen not to
attach it as in here. There are two files in this .tar - one was taken 15:04
and the other one on 15:08.

Again I will be very thankful for any help.

Peter Sopko, IT Security Consultant
Tempest a.s.


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bryn M. Reeves
Sent: Wednesday, April 04, 2007 2:45 PM
To: linux clustering
Subject: Re: [Linux-cluster] problem with deadlocked processes (D)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Peter Sopko wrote:
> Hi,
> 
> today a strange thing occurred - on both of our cluster nodes a lot of
> processes suddenly started to become locked in the D state (i/o lock).
This
> thing has already happened once before (six months ago), but a simple
reboot
> helped to solve this issue. But as it appeared again, I don't want to
solve
> it this way again, I would like to find the reason why this is happening,
> but have no idea where to start. In /var/log/messages there is nothing
> unusual, the only thing is that some directories are unremoveable and a
lot
> of processes locked. 

For problems where processes are getting stuck in D state it's usually
helpful to get sysrq-t data to see where the threads are stuck. Grab two
sets of data a few seconds apart so that you can see if things are
really stuck or just making slow progress.

You can also get some information from the wchan data exposed in /proc -
it's easiest to view with ps:

$ ps ax -ocomm,pid,state,wchan
COMMAND           PID S WCHAN
vim             22322 S -
bash            22471 S -
man             22817 S wait
sh              22820 S wait
sh              22821 S wait
less            22826 S -
bash            22839 S wait
screen          23435 S pause
[...]

Regards,
Bryn.



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFGE5226YSQoMYUY94RAgm0AKDdPg/mcTHilSwMpd6+Meno2zBLtACgt+/j
TT3MsBrg6/gpdBdPDYMEp5Q=
=ADyt
-----END PGP SIGNATURE-----

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list