[Linux-cluster] GFS2 directory hangs on one node CentOS 5.3

Steven Whitehouse swhiteho at redhat.com
Mon Sep 28 08:47:09 UTC 2009


Hi,

On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
> Hi all,
> 
> I'm having a strange issue with a two nodes cluster based on xen
> virtual hosts with shared disk on clvm. The servers are running apache
> and one is considered as hot backup. On that node awstats are counted
> from the apache custom logs stored on the shared device. Web data,
> logs, configs and awstats results are in different directories withing
> the same GFS2 volume.
> 
> Everything works fine, but sometimes (at production environment, damn)
> the directory with logs get frozen for the spare node with awstats.
> All commands like ls, cd, mc on that directory get status D. On the
> second node all works fine. Other directories seems unaffected too.
> 
> I can not umount fs neither remout it ro and back rw since there are
> "running" processes at D state.
> 
> Can someone give me some advice, how-to prevent this problem? And
> how-to recovery from it? It is a production with SLA on :(  In next
> time, I'll try to make lockdump on both nodes.
> 
> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
> kmod-gfs2-xen-1.92-1.1.el5_2.2
> 
> Regards
> 
> Libor
> 
That sounds to me like there is a lot of activity from both nodes
relating to the same directory. Can you split the logs of the two nodes
into two different directories? That will probably solve the problem.

This kind of problem is tricky to debug since the glock dumps will tell
you what state the glocks are currently in, and not what has been
happening the in past.

In the upstream code we've now got GFS2 tracepoints which will help in
tracking down issues like this, but those are not in RHEL yet,

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list