[Linux-cluster] Processes in D state

Emilio Arjona emilio at ugr.es
Tue Jan 4 11:27:52 UTC 2011


Same problem here,

in a webserver cluster httpd run into D state sometimes. I have to restart
the node or even the whole cluster if there are more than one node locked.
I'm using REDHAT 5.4 and HP hardware.

Regards,

2011/1/4 Paras pradhan <pradhanparas at gmail.com>

> I had the same problem. it locked the whole gfs cluster and had to
> reboot the node. after reboot all is fine now but still trying to find
> out what has caused it.
>
> Paras
>
> On Monday, January 3, 2011, InterNetworX | Hostmaster
> <hostmaster at inwx.de> wrote:
> > Hello,
> >
> > we are using GFS2 but sometimes there are processes hanging in D state:
> >
> > # ps axl | grep D
> > F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME
> COMMAND
> > 0     0 14220 14219  20   0  19624  1916 -      Ds   ?          0:00
> > /usr/lib/postfix/master -t
> > 0     0 14555 14498  20   0  16608  1716 -      D+
> > /mnt/storage/openvz/root/129/dev/pts/0   0:00 apt-get install less
> > 0     0 15068 15067  19  -1  36844  2156 -      D<s  ?          0:00
> > /usr/lib/postfix/master -t
> > 0     0 16603 16602  19  -1  36844  2156 -      D<s  ?          0:00
> > /usr/lib/postfix/master -t
> > 4   101 19534 13238  19  -1  33132  2984 -      D<   ?          0:00
> > smtpd -n smtp -t inet -u -c
> > 4   101 19542 13238  19  -1  33116  2976 -      D<   ?          0:00
> > smtpd -n smtp -t inet -u -c
> > 0     0 19735 13068  20   0   7548   880 -      S+   pts/0      0:00 grep
> D
> >
> > dmesg shows this message many times:
> >
> > [11142.334229] INFO: task master:14220 blocked for more than 120 seconds.
> > [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [11142.334310] master        D ffff88032b644800     0 14220  14219
> > 0x00000000
> > [11142.334315]  ffff88062dd40000 0000000000000086 0000000000000000
> > ffffffffa02628d9
> > [11142.334318]  ffff88017a517ef8 000000000000fa40 ffff88017a517fd8
> > 0000000000016940
> > [11142.334322]  0000000000016940 ffff88032b644800 ffff88032b644af8
> > 0000000b7a517cd8
> > [11142.334325] Call Trace:
> > [11142.334340]  [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2]
> > [11142.334347]  [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd
> [gfs2]
> > [11142.334353]  [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd
> [gfs2]
> > [11142.334358]  [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70
> > [11142.334363]  [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd
> [gfs2]
> > [11142.334367]  [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77
> > [11142.334370]  [<ffffffff81066808>] ? wake_bit_function+0x0/0x23
> > [11142.334376]  [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2]
> > [11142.334383]  [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2]
> > [11142.334386]  [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a
> > [11142.334389]  [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65
> > [11142.334393]  [<ffffffff8112221b>] ? sys_flock+0xff/0x12a
> > [11142.334396]  [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> >
> > Any idea what is going wrong? Do you need any more informations?
> >
> > Mario
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110104/96c37afe/attachment.htm>


More information about the Linux-cluster mailing list