[Linux-cluster] Oops

Thu May 24 15:42:11 UTC 2007

David Teigland wrote:
> On Thu, May 24, 2007 at 03:51:08PM +0200, Wagner Ferenc wrote:
>> Hi,
>>
>> I wasn't sure whether to send this to LKML or here, but DLM seems
>> involved.  Please let me know if I'd better repost it to somewhere
>> else.
> 
> Here is good.
> 
>> It's a vanilla 2.6.21 kernel patched by cluster-2.00.00 (with the
>> three extra export for GFS1).  Config attached.  The machine froze
>> during the morning updatedb cronjob, which performed a recursive find
>> into the shared GFS filesystem.  Two other nodes doing the same at the
>> same time are still up.
>>
>> I experienced a similar hang with cluster-1 not long ago, though that
>> didn't lock up the whole machine, but the cluster software only.
> 
> updatedb, even on just one node (much less all) is never going to be a
> good thing to run on gfs... our standard response is "don't do that".
> 
>> Please ask back if I didn't provide all information necessary.
> 
> I also ran into this bug last week and was testing some patches from
> Patrick to try to figure it out -- I got distracted with other things but
> will get back to it again soon.  My test that hit it was doing looping
> mount/unmount on four nodes.

Actually, I'm not sure that this is the same bug. This one seems to be a
work_queue entry being added twice (though how that can happen is a mystery to
me) and happens under load. Whereas yours was accessing a deallocated struct
after umount.

-- 
Patrick

Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street,
Windsor, Berkshire, SL4 ITE, UK.
Registered in England and Wales under Company Registration No. 3798903