[Linux-cluster] dlm caused a kernel panic

David Teigland teigland at redhat.com
Wed Dec 14 19:45:38 UTC 2005

On Wed, Dec 14, 2005 at 07:43:40AM -0800, Jeff Dinisco wrote:
> Is the slow output from df expected?  Does it just take considerable
> time to read a gfs superblock?  

Yes, it's expected; df locks ever resource group in the fs to collect
usage information, so large fs's will take longer, and heavy writers on
other nodes will delay it further.

> In my scenario, is it likely that heavy lock load was caused by the
> combination df and a umount at the same time?  

I'm not sure lock load is related to this particular case.  After studying
your logs I think I know what the problem is; it's a situation where a dlm
message from an unmounting node is received after recovery for it is
completed on the remaining nodes.  A quick and correct fix would be to
remove the assertion (or perhaps change it, I'll see.)

> Were the gfs recover events in the log prior to the kernel panic
> normal, or is it possible that I attempted the umount too quickly after
> mounting?  

Mounting and unmounting always involve dlm recovery which is more prone to
bugs and corner cases, so avoiding unnecessary or rapidly repeating
mounting/unmounting is usually wise.  You didn't do anything wrong,
though; it's simply a corner case we aren't handling properly.

> Would r/o mounts decrease lock load and the likelihood of this occurring
> again?



