[Linux-cluster] dlm caused a kernel panic

Wed Dec 14 15:43:40 UTC 2005

Is the slow output from df expected?  Does it just take considerable
time to read a gfs superblock?  In my scenario, is it likely that heavy
lock load was caused by the combination df and a umount at the same
time?  Were the gfs recover events in the log prior to the kernel panic
normal, or is it possible that I attempted the umount too quickly after
mounting?  Would r/o mounts decrease lock load and the likelihood of
this occurring again?  

Thanks for the help.  I was just about to move this into production and
now I'm a little apprehensive.  I just want make sure I'm taking the
necessary precautions. 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Wednesday, December 14, 2005 3:54 AM
To: linux clustering
Subject: Re: [Linux-cluster] dlm caused a kernel panic

Jeff Dinisco wrote:
> I'm running FC4 (2.6.13-1.1532_FC4smp), dlm-1.0.0-3 and GFS-6.1.0-3.
I
> have a 3 node cluster.  The df command has always been very slow to
> return output on my gfs mounted filesystems.  Series of events...
> 
> 16:20:00 - node01 was out of the cluster, node02 and node03 were
active
> with 2 gfs filesystems mounted
> 16:22:10 - after joining the cluster, both filesystems were
successfully
> mounted
> 16:22:37 - a df command was attempted by a monitoring script
> 16:22:54 - I executed /etc/init.d/gfs stop and it failed because 1 of
> the filesystems was busy and could not be umounted (the above df
command
> may have been the cause, it ended up hanging)
> 
> 16:22:55 - node02 and node03 panicked and were not properly fenced

If there was only one node left in the cluster it would not fence the
other
two because it doesn't have quorum. So it can't be sure that it's not
just
been cut off from the other two nodes and they might be working fine.

> Dec 13 16:22:56 node02 kernel: ------------[ cut here ]------------
> Dec 13 16:22:56 node02 kernel: kernel BUG at
> /usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c:1007!

I can reproduce this under very heavy lock load, but I'm not sure what's
causing it as yet. The "flood" tool I check in to STABLE yesterday is
almost
guaranteed to cause it.

-- 

patrick

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster