[Linux-cluster] DLM in recover state - node can't connect to cluster

Maciej Bogucki maciej.bogucki at artegence.com
Thu Aug 16 14:16:53 UTC 2007


I have five node cluster. Node05 failed(kernel panic), and fencing
failed. When I rebooted failed node05, it can't connect to cluster and
filesystem is locked, because it is in the recover state. I need to
reboot all nodes to recover cluster.

On node05 I get "fenced: startup failed"

Here is the output form another node in cluster:

[root at node03 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run
[2 3 5 4]

DLM Lock Space:  "clvmd"                             2   3 run
[2 3 5 4]

DLM Lock Space:  "repository"                        3   4 recover 2 -
[2 3 5 4]

GFS Mount Group: "repository"                        4   5 recover 0 -
[2 3 5 4]

[root at node03 ~]#

What does mean "U-1,10,1"?

Here is some information form cluster.conf

<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
<cman expected_votes="3" deadnode_timeout="120" hello_timer="10"/>

I don't have the latest cman, fence, dlm, and kernel, so maybe it is a


Best Regards
Maciej Bogucki

More information about the Linux-cluster mailing list