[Linux-cluster] DLM in recover state - node can't connect to cluster

Maciej Bogucki maciej.bogucki at artegence.com
Thu Aug 16 14:45:30 UTC 2007


Maciej Bogucki napisał(a):
> Hello,
> 
> I have five node cluster. Node05 failed(kernel panic), and fencing
> failed. When I rebooted failed node05, it can't connect to cluster and
> filesystem is locked, because it is in the recover state. I need to
> reboot all nodes to recover cluster.
> 
> On node05 I get "fenced: startup failed"
> 
> Here is the output form another node in cluster:
> 
> ---cut---
> [root at node03 ~]# cat /proc/cluster/services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           1   2 run
> U-1,10,1
> [2 3 5 4]
> 
> DLM Lock Space:  "clvmd"                             2   3 run
> U-1,10,1
> [2 3 5 4]
> 
> DLM Lock Space:  "repository"                        3   4 recover 2 -
> [2 3 5 4]
> 
> GFS Mount Group: "repository"                        4   5 recover 0 -
> [2 3 5 4]
> 
> [root at node03 ~]#
> ---cut---
> 
> What does mean "U-1,10,1"?
> 
> Here is some information form cluster.conf
> 
> ---cut---
> <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
> <cman expected_votes="3" deadnode_timeout="120" hello_timer="10"/>
> ---cut---
> 
> I don't have the latest cman, fence, dlm, and kernel, so maybe it is a
> problem?
> 
> cman-1.0.11-0
> fence-1.32.25-1
> dlm-1.0.1-1
> kernel-smp-2.6.9-42.0.3.EL
> 

I have found it in the logs also

Aug 16 14:13:44 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 67098
Aug 16 14:14:07 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 72602
Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 64752
Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 67108
Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 69654
Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 69781
Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv
rsb not found 87705

What does it mean?

Best Regards
Maciej Bogucki




More information about the Linux-cluster mailing list