[Linux-cluster] Processes in D state
Emilio Arjona
emilio at ugr.es
Tue Jan 4 11:27:52 UTC 2011
Same problem here,
in a webserver cluster httpd run into D state sometimes. I have to restart
the node or even the whole cluster if there are more than one node locked.
I'm using REDHAT 5.4 and HP hardware.
Regards,
2011/1/4 Paras pradhan <pradhanparas at gmail.com>
> I had the same problem. it locked the whole gfs cluster and had to
> reboot the node. after reboot all is fine now but still trying to find
> out what has caused it.
>
> Paras
>
> On Monday, January 3, 2011, InterNetworX | Hostmaster
> <hostmaster at inwx.de> wrote:
> > Hello,
> >
> > we are using GFS2 but sometimes there are processes hanging in D state:
> >
> > # ps axl | grep D
> > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME
> COMMAND
> > 0 0 14220 14219 20 0 19624 1916 - Ds ? 0:00
> > /usr/lib/postfix/master -t
> > 0 0 14555 14498 20 0 16608 1716 - D+
> > /mnt/storage/openvz/root/129/dev/pts/0 0:00 apt-get install less
> > 0 0 15068 15067 19 -1 36844 2156 - D<s ? 0:00
> > /usr/lib/postfix/master -t
> > 0 0 16603 16602 19 -1 36844 2156 - D<s ? 0:00
> > /usr/lib/postfix/master -t
> > 4 101 19534 13238 19 -1 33132 2984 - D< ? 0:00
> > smtpd -n smtp -t inet -u -c
> > 4 101 19542 13238 19 -1 33116 2976 - D< ? 0:00
> > smtpd -n smtp -t inet -u -c
> > 0 0 19735 13068 20 0 7548 880 - S+ pts/0 0:00 grep
> D
> >
> > dmesg shows this message many times:
> >
> > [11142.334229] INFO: task master:14220 blocked for more than 120 seconds.
> > [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [11142.334310] master D ffff88032b644800 0 14220 14219
> > 0x00000000
> > [11142.334315] ffff88062dd40000 0000000000000086 0000000000000000
> > ffffffffa02628d9
> > [11142.334318] ffff88017a517ef8 000000000000fa40 ffff88017a517fd8
> > 0000000000016940
> > [11142.334322] 0000000000016940 ffff88032b644800 ffff88032b644af8
> > 0000000b7a517cd8
> > [11142.334325] Call Trace:
> > [11142.334340] [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2]
> > [11142.334347] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd
> [gfs2]
> > [11142.334353] [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd
> [gfs2]
> > [11142.334358] [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70
> > [11142.334363] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd
> [gfs2]
> > [11142.334367] [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77
> > [11142.334370] [<ffffffff81066808>] ? wake_bit_function+0x0/0x23
> > [11142.334376] [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2]
> > [11142.334383] [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2]
> > [11142.334386] [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a
> > [11142.334389] [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65
> > [11142.334393] [<ffffffff8112221b>] ? sys_flock+0xff/0x12a
> > [11142.334396] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> >
> > Any idea what is going wrong? Do you need any more informations?
> >
> > Mario
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110104/96c37afe/attachment.htm>
More information about the Linux-cluster
mailing list