[Linux-cluster] problem with deadlocked processes (D)
Peter Sopko
Peter_Sopko at tempest.sk
Wed Apr 4 13:18:18 UTC 2007
Hi,
thanks for your reply Bryn.
The output of the ps command you suggested (i've ommited the standard system
processes) :
[root at mail1 subsys]# ps ax -ocomm,pid,state,wchan |more
COMMAND PID S WCHAN
ccsd 2258 S -
cman_comms 2310 S cluster_kthread
cman_serviced 2312 S serviced
cman_memb 2311 S membership_kthread
cman_hbeat 2315 S hello_kthread
fenced 2336 S rt_sigsuspend
dlm_astd 2354 S dlm_astd
dlm_recvd 2355 S dlm_recvd
dlm_sendd 2356 S dlm_sendd
lock_dlm1 2358 S dlm_async
lock_dlm2 2359 S dlm_async
gfs_scand 2360 S -
gfs_glockd 2361 S gfs_glockd
gfs_recoverd 2362 S -
gfs_logd 2363 S -
gfs_quotad 2364 D glock_wait_internal
gfs_inoded 2365 D dlm_lock_sync
syslogd 2374 S -
klogd 2394 S syslog
heartbeat 2503 S -
courierlogger 2526 S pipe_wait
authdaemond 2527 S -
authdaemond 2551 S -
authdaemond 2552 S -
authdaemond 2553 S -
authdaemond 2554 S -
authdaemond 2555 S -
heartbeat 2586 S pipe_wait
heartbeat 2587 S -
heartbeat 2588 S -
heartbeat 2589 S -
heartbeat 2590 S -
acpid 2595 S -
ipfail 2608 S -
nod32d 2609 S -
nod32smtp 2618 S -
sshd 2627 S -
ntpd 2642 S -
courierlogger 2654 S pipe_wait
couriertcpd 2655 S -
courierlogger 2661 S pipe_wait
couriertcpd 2662 S -
courierlogger 2667 S pipe_wait
couriertcpd 2668 S wait
courierlogger 2673 S pipe_wait
couriertcpd 2674 S -
master 2815 S -
master 3024 S -
httpd 3039 S -
crond 3048 S -
rhnsd 3067 S -
mingetty 3074 S -
mingetty 3075 S -
mingetty 3076 S -
mingetty 3077 S -
mingetty 3078 S -
mingetty 3079 S -
ntpd 3888 S rt_sigsuspend
tlsmgr 4544 S -
tlsmgr 1585 S -
anvil 1699 S -
spamd 29941 S -
httpd 15674 D glock_wait_internal
httpd 15675 D glock_wait_internal
httpd 15676 D glock_wait_internal
httpd 15677 D glock_wait_internal
httpd 15678 D glock_wait_internal
httpd 15679 D glock_wait_internal
httpd 15680 D glock_wait_internal
httpd 15681 D glock_wait_internal
httpd 30808 D glock_wait_internal
httpd 30809 D glock_wait_internal
httpd 30810 D glock_wait_internal
httpd 30825 D glock_wait_internal
httpd 30827 D glock_wait_internal
httpd 30828 D glock_wait_internal
httpd 30829 D glock_wait_internal
httpd 30830 D glock_wait_internal
httpd 30831 D glock_wait_internal
httpd 30832 D glock_wait_internal
httpd 30835 D glock_wait_internal
httpd 30840 D glock_wait_internal
spamd 17341 S -
proxymap 24868 S -
proxymap 27542 S -
mysqld_safe 30617 S wait
mysqld 30650 S -
trivial-rewrite 30735 S -
proxymap 30742 S -
sshd 517 S -
sshd 519 S -
bash 520 S wait
su 740 S wait
bash 741 S -
imapd 15018 D lock_on_glock
virtual 15699 D lock_on_glock
trivial-rewrite 15918 S -
proxymap 15922 S -
virtual 15943 D lock_on_glock
virtual 15952 D lock_on_glock
virtual 15965 D lock_on_glock
pop3d 15966 D lock_on_glock
pop3d 15967 D lock_on_glock
virtual 15968 D lock_on_glock
pop3d 15971 D lock_on_glock
pop3d 15983 D lock_on_glock
virtual 16046 D lock_on_glock
pop3d 16049 D lock_on_glock
pop3d 16053 D lock_on_glock
pop3d 16068 D glock_wait_internal
pop3d 16074 D glock_wait_internal
virtual 16077 D lock_on_glock
spamd 16112 S -
virtual 16129 D lock_on_glock
virtual 16133 D lock_on_glock
pop3d 16143 D glock_wait_internal
virtual 16153 D lock_on_glock
virtual 16160 D glock_wait_internal
virtual 16163 D lock_on_glock
pop3d 16164 D glock_wait_internal
virtual 16179 D lock_on_glock
pop3d 16183 D glock_wait_internal
pop3d 16186 D glock_wait_internal
pop3d 16187 D glock_wait_internal
virtual 16191 D lock_on_glock
pop3d 16192 D lock_on_glock
virtual 16194 D lock_on_glock
pop3d 16202 D glock_wait_internal
virtual 16207 D lock_on_glock
virtual 16217 D lock_on_glock
virtual 16222 D lock_on_glock
....
smtp 21150 S -
smtp 21162 S flock_lock_file_wait
cleanup 21181 S flock_lock_file_wait
smtpd 21213 S -
spamfilter.sh 21224 S wait
cat 21225 S pipe_wait
spamfilter.sh 21226 D -
spamfilter.sh 21229 S wait
pipe 21230 S -
cat 21231 S pipe_wait
spamfilter.sh 21232 D -
spamfilter.sh 21235 S wait
cat 21236 S pipe_wait
spamfilter.sh 21237 D -
spamfilter.sh 21239 S wait
spamfilter.sh 21240 S wait
cat 21242 S pipe_wait
spamfilter.sh 21243 D -
virtual 21244 D lock_on_glock
cat 21245 S pipe_wait
spamfilter.sh 21246 D -
spamfilter.sh 21249 S wait
cat 21250 S pipe_wait
spamfilter.sh 21251 D -
spamfilter.sh 21252 S wait
cat 21253 S pipe_wait
spamfilter.sh 21254 D -
spamfilter.sh 21257 S wait
cat 21258 S pipe_wait
spamfilter.sh 21259 D -
spamfilter.sh 21261 S wait
spamfilter.sh 21262 S wait
spamfilter.sh 21263 S wait
cat 21264 S pipe_wait
spamfilter.sh 21265 D -
spamfilter.sh 21267 D -
cat 21268 S pipe_wait
spamfilter.sh 21269 D -
spamfilter.sh 21273 S wait
...
etc....
The sysrq-t output is to be found on this url -
http://www.backbone.sk/sysrq.tar. It's 400k in size, so I have chosen not to
attach it as in here. There are two files in this .tar - one was taken 15:04
and the other one on 15:08.
Again I will be very thankful for any help.
Peter Sopko, IT Security Consultant
Tempest a.s.
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bryn M. Reeves
Sent: Wednesday, April 04, 2007 2:45 PM
To: linux clustering
Subject: Re: [Linux-cluster] problem with deadlocked processes (D)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Peter Sopko wrote:
> Hi,
>
> today a strange thing occurred - on both of our cluster nodes a lot of
> processes suddenly started to become locked in the D state (i/o lock).
This
> thing has already happened once before (six months ago), but a simple
reboot
> helped to solve this issue. But as it appeared again, I don't want to
solve
> it this way again, I would like to find the reason why this is happening,
> but have no idea where to start. In /var/log/messages there is nothing
> unusual, the only thing is that some directories are unremoveable and a
lot
> of processes locked.
For problems where processes are getting stuck in D state it's usually
helpful to get sysrq-t data to see where the threads are stuck. Grab two
sets of data a few seconds apart so that you can see if things are
really stuck or just making slow progress.
You can also get some information from the wchan data exposed in /proc -
it's easiest to view with ps:
$ ps ax -ocomm,pid,state,wchan
COMMAND PID S WCHAN
vim 22322 S -
bash 22471 S -
man 22817 S wait
sh 22820 S wait
sh 22821 S wait
less 22826 S -
bash 22839 S wait
screen 23435 S pause
[...]
Regards,
Bryn.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iD8DBQFGE5226YSQoMYUY94RAgm0AKDdPg/mcTHilSwMpd6+Meno2zBLtACgt+/j
TT3MsBrg6/gpdBdPDYMEp5Q=
=ADyt
-----END PGP SIGNATURE-----
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list