[Linux-cluster] GFS hangs, nodes die

Sebastian Walter sebastian.walter at fu-berlin.de
Mon Aug 20 16:19:31 UTC 2007


Hi,

after putting massive load on the cluster, 55 % of the nodes died again
(after adjusting the glock_purge to 50). I don't think (and hope) that
it's the hardware, as normal filesystems don't make problems and running
it with low load also runs fine. I will check this, but it will be a
more comprehensive task. Maybe I can improve by tuning the volume better?

Here is what /var/log/messages gives me:
Aug 20 16:24:50 compute-0-10.local clurgmgrd[4283]: <err> #48: Unable to
obtain cluster lock: Connection timed out 
Aug 20 16:25:04 compute-0-3.local clurgmgrd[4280]: <err> #48: Unable to
obtain cluster lock: Connection timed out 
Aug 20 16:25:35 compute-0-10.local clurgmgrd[4283]: <err> #50: Unable to
obtain cluster lock: Connection timed out 
Aug 20 16:25:49 compute-0-3.local clurgmgrd[4280]: <err> #50: Unable to
obtain cluster lock: Connection timed out  
(these are the errors from the still running nodes, they are repeated
several times)

gfs_tool counters /global/home is blocked and not responding. Btw, I'm
running CentOS 4 Update 5 on all the nodes.

Thanks for any comment. Regards,
Sebastian

Wendy Cheng wrote:
> Sebastian Walter wrote:
>
>>  
>>
>>>>>>
>>>>>> This is what /var/log/messages gives me (on nearly all nodes):
>>>>>> Aug 18 04:39:06 compute-0-2 clurgmgrd[4225]: <err> #49: Failed
>>>>>> getting
>>>>>> status for RG gfs-2
>>>>>> and e.g.
>>>>>> Aug 18 04:45:38 compute-0-6 clurgmgrd[9074]: <err> #50: Unable to
>>>>>> obtain
>>>>>> cluster lock: Connection timed out
>>>>>>
>>>>>>         
>
> GFS glock trimming patch *could* help. However, the lock leak *here*
> is from clurgmgrd (cluster infrastructure), not GFS (filesystem)
> itself. So these two are different issues. I vaguely recall clurgmgrd
> did have a bugzilla for this and was fixed sometime ago.
>
> Lon ?
>
> -- Wendy
>
>
>
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list