[Linux-cluster] Re: rgmanger stuck, hung on futex

Lon Hohberger lhh at redhat.com
Mon Dec 11 19:49:56 UTC 2006


On Mon, 2006-12-11 at 10:22 -0800, aberoham at gmail.com wrote:
> Another clue -- haldaemon crashed on this node, perhaps at the same
> time clurgmgrd started to hang? 
> 
> lastest dmesg entry --
> hal[3509]: segfault at 0000000000000000 rip 0000000000400ec7 rsp
> 0000007fbfffd7e0 error 4 
> 
> grep clurgmgrd /var/log/messages --
> [snip]
> Dec 11 06:39:43 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/rsyncd-tiger status
> Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/httpd.cluster status 
> Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/rsyncd-hartigan status
> Dec 11 06:41:11 bamf01 clurgmgrd[7983]: <err> #48: Unable to obtain
> cluster lock: Connection timed out
> Dec 11 06:41:56 bamf01 clurgmgrd[7983]: <err> #50: Unable to obtain
> cluster lock: Connection timed out 
> [snip]

Could you check /proc/slabinfo and post it from all nodes?  I think I
know what this is.

-- Lon

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061211/0d0b7e41/attachment.sig>


More information about the Linux-cluster mailing list