Please find /proc/slabinfo from all currently running nodes attached. Prior to cat /proc/slabinfo I was impatient and kill -9'd the clurgmgrd PIDs on the failed node. Then I ran /etc/init.d/rgmanager stop, though that operation is still running, "Waiting for services to stop:", with the services still running and operational. Thanks Lon! <div>On 12/11/06, Lon Hohberger <<a href="mailto:lhh@redhat.com">lhh@redhat.com</a>> wrote:<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On Mon, 2006-12-11 at 10:22 -0800, <a href="mailto:aberoham@gmail.com">aberoham@gmail.com</a> wrote: > Another clue -- haldaemon crashed on this node, perhaps at the same > time clurgmgrd started to hang? > > lastest dmesg entry -- > hal[3509]: segfault at 0000000000000000 rip 0000000000400ec7 rsp > 0000007fbfffd7e0 error 4 > > grep clurgmgrd /var/log/messages -- > [snip] > Dec 11 06:39:43 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/rsyncd-tiger status > Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/httpd.cluster status > Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/rsyncd-hartigan status > Dec 11 06:41:11 bamf01 clurgmgrd[7983]: <err> #48: Unable to obtain > cluster lock: Connection timed out > Dec 11 06:41:56 bamf01 clurgmgrd[7983]: <err> #50: Unable to obtain > cluster lock: Connection timed out > [snip] Could you check /proc/slabinfo and post it from all nodes? I think I know what this is. -- Lon -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster">https://www.redhat.com/mailman/listinfo/linux-cluster</a> </blockquote></div>