[Linux-cluster] Processes in D state

Paras pradhan pradhanparas at gmail.com
Tue Jan 4 01:23:25 UTC 2011


I had the same problem. it locked the whole gfs cluster and had to
reboot the node. after reboot all is fine now but still trying to find
out what has caused it.

Paras

On Monday, January 3, 2011, InterNetworX | Hostmaster
<hostmaster at inwx.de> wrote:
> Hello,
>
> we are using GFS2 but sometimes there are processes hanging in D state:
>
> # ps axl | grep D
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 0     0 14220 14219  20   0  19624  1916 -      Ds   ?          0:00
> /usr/lib/postfix/master -t
> 0     0 14555 14498  20   0  16608  1716 -      D+
> /mnt/storage/openvz/root/129/dev/pts/0   0:00 apt-get install less
> 0     0 15068 15067  19  -1  36844  2156 -      D<s  ?          0:00
> /usr/lib/postfix/master -t
> 0     0 16603 16602  19  -1  36844  2156 -      D<s  ?          0:00
> /usr/lib/postfix/master -t
> 4   101 19534 13238  19  -1  33132  2984 -      D<   ?          0:00
> smtpd -n smtp -t inet -u -c
> 4   101 19542 13238  19  -1  33116  2976 -      D<   ?          0:00
> smtpd -n smtp -t inet -u -c
> 0     0 19735 13068  20   0   7548   880 -      S+   pts/0      0:00 grep D
>
> dmesg shows this message many times:
>
> [11142.334229] INFO: task master:14220 blocked for more than 120 seconds.
> [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11142.334310] master        D ffff88032b644800     0 14220  14219
> 0x00000000
> [11142.334315]  ffff88062dd40000 0000000000000086 0000000000000000
> ffffffffa02628d9
> [11142.334318]  ffff88017a517ef8 000000000000fa40 ffff88017a517fd8
> 0000000000016940
> [11142.334322]  0000000000016940 ffff88032b644800 ffff88032b644af8
> 0000000b7a517cd8
> [11142.334325] Call Trace:
> [11142.334340]  [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2]
> [11142.334347]  [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2]
> [11142.334353]  [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd [gfs2]
> [11142.334358]  [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70
> [11142.334363]  [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2]
> [11142.334367]  [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77
> [11142.334370]  [<ffffffff81066808>] ? wake_bit_function+0x0/0x23
> [11142.334376]  [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2]
> [11142.334383]  [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2]
> [11142.334386]  [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a
> [11142.334389]  [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65
> [11142.334393]  [<ffffffff8112221b>] ? sys_flock+0xff/0x12a
> [11142.334396]  [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
>
> Any idea what is going wrong? Do you need any more informations?
>
> Mario
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list