[Linux-cluster] Hight I/O Wait Rates - RHEL 6.1 + GFS2 + NFS

Omer Faruk SEN omerfsen at gmail.com
Tue Jun 28 06:05:36 UTC 2011


Hi,

Open a ticket so Red Hat technical staff can take care of this. I think it
is the fastest way to resolve and fix this issue.

Regards.

On Tue, Jun 28, 2011 at 8:55 AM, anderson souza <andersonlira at gmail.com>wrote:

> Hi everyone,
>
> I have an Active/Passive RHCS 6.1 runing with 8TB of GFS2 with NFS on
> top and exporting 26 mouting points to 250 NFS clients. The GFS2 mounting
> points are mounted with noatime, nodiratime, data=writeback and localflocks
> options, and also the SAN and servers are fast (4Gbps and 8Gb, dual
> controllers working in LB, H.A... QuadCore, 48GB of memory...). The cluster
> has been doing its work (failover working fine...), however
> and unfortunately I have seen hight I/Owait rates, sometimes around 60-70%
> (on which is very bad), and a couple of glock_workqueue jobs, so I get a
> bunch of gfs2_quotad, nfsd errors and qdisk latency. The debugfs didn't show
> me "W", only "G" and "H".
>
> Have you guys seen it before?
> Looks like some glock's contention?
> How could I get it fixed and what does it mean?
>
> Thank you very much
>
>
> Jun 27 18:48:05  kernel: INFO: task gfs2_quotad:19066 blocked for more than
> 120 seconds.
> Jun 27 18:48:05  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> Jun 27 18:48:05  kernel: gfs2_quotad   D 0000000000000004     0 19066
> 2 0x00000080
> Jun 27 18:48:05  kernel: ffff880bb01e1c20 0000000000000046 0000000000000000
> ffffffffa045ec6d
> Jun 27 18:48:05  kernel: 0000000000000000 ffff880be6e2b000 ffff880bb01e1c50
> 00000001051d8b46
> Jun 27 18:48:05  kernel: ffff880be4865af8 ffff880bb01e1fd8 000000000000f598
> ffff880be4865af8t
> Jun 27 18:48:05  kernel: Call Trace:
> Jun 27 18:48:05  kernel: [<ffffffffa045ec6d>] ? dlm_put_lockspace+0x1d/0x40
> [dlm]
> Jun 27 18:48:05  kernel: [<ffffffffa0525c50>] ?
> gfs2_glock_holder_wait+0x0/0x20 [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffffa0525c5e>]
> gfs2_glock_holder_wait+0xe/0x20 [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff814db87f>] __wait_on_bit+0x5f/0x90
> Jun 27 18:48:05  kernel: [<ffffffffa0525c50>] ?
> gfs2_glock_holder_wait+0x0/0x20 [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff814db928>]
> out_of_line_wait_on_bit+0x78/0x90
> Jun 27 18:48:05  kernel: [<ffffffff8108e140>] ? wake_bit_function+0x0/0x50
> Jun 27 18:48:05  kernel: [<ffffffffa0526816>] gfs2_glock_wait+0x36/0x40
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffffa0529011>] gfs2_glock_nq+0x191/0x370
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff8107a11b>] ?
> try_to_del_timer_sync+0x7b/0xe0
> Jun 27 18:48:05  kernel: [<ffffffffa05427f8>] gfs2_statfs_sync+0x58/0x1b0
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff814db52a>] ?
> schedule_timeout+0x19a/0x2e0
> Jun 27 18:48:05  kernel: [<ffffffffa05427f0>] ? gfs2_statfs_sync+0x50/0x1b0
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffffa053a787>] quotad_check_timeo+0x57/0xb0
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffffa053aa14>] gfs2_quotad+0x234/0x2b0
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff8108e100>] ?
> autoremove_wake_function+0x0/0x40
> Jun 27 18:48:05  kernel: [<ffffffffa053a7e0>] ? gfs2_quotad+0x0/0x2b0
> [gfs2]
> Jun 27 18:48:05  kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0
> Jun 27 18:48:05  kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20
> Jun 27 18:48:05  kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0
> Jun 27 18:48:05  kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
>
> Jun 27 19:49:07  kernel: __ratelimit: 57 callbacks suppressed
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 19:49:07  kernel: nfsd: peername failed (err 107)!
> Jun 27 20:00:58  kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140
> bytes - shutting down socket
> Jun 27 20:00:58  kernel: __ratelimit: 40 callbacks suppressed
> qdiskd[10078]: qdisk cycle took more than 1 second to complete (1.170000)
> qdisk cycle took more than 1 second to complete (1.120000)
>
> Thanks
> James S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110628/971e7d51/attachment.htm>


More information about the Linux-cluster mailing list