[Linux-cluster] DLM in recover state - node can't connect to cluster

Maciej Bogucki maciej.bogucki at artegence.com
Thu Aug 16 14:16:53 UTC 2007


Hello,

I have five node cluster. Node05 failed(kernel panic), and fencing
failed. When I rebooted failed node05, it can't connect to cluster and
filesystem is locked, because it is in the recover state. I need to
reboot all nodes to recover cluster.

On node05 I get "fenced: startup failed"

Here is the output form another node in cluster:

---cut---
[root at node03 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run
U-1,10,1
[2 3 5 4]

DLM Lock Space:  "clvmd"                             2   3 run
U-1,10,1
[2 3 5 4]

DLM Lock Space:  "repository"                        3   4 recover 2 -
[2 3 5 4]

GFS Mount Group: "repository"                        4   5 recover 0 -
[2 3 5 4]

[root at node03 ~]#
---cut---

What does mean "U-1,10,1"?

Here is some information form cluster.conf

---cut---
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
<cman expected_votes="3" deadnode_timeout="120" hello_timer="10"/>
---cut---

I don't have the latest cman, fence, dlm, and kernel, so maybe it is a
problem?

cman-1.0.11-0
fence-1.32.25-1
dlm-1.0.1-1
kernel-smp-2.6.9-42.0.3.EL

Best Regards
Maciej Bogucki




More information about the Linux-cluster mailing list