[Linux-cluster] Unexpected problems with clvmd
Christine Caulfield
ccaulfie at redhat.com
Wed Dec 3 13:52:37 UTC 2008
Shaun Mccullagh wrote:
> Hi,
>
> I tried to add another node to our 3 node cluster this morning.
>
> Initially things went well; but I wanted to check the new node booted
> correctly.
>
> After the second reboot clvmd failed to start up on the new node (called
> pan4):
>
> [root at pan4 ~]# clvmd -d1 -T20
> CLVMD[8e1e8300]: Dec 3 14:24:09 CLVMD started
> CLVMD[8e1e8300]: Dec 3 14:24:09 Connected to CMAN
> CLVMD[8e1e8300]: Dec 3 14:24:12 CMAN initialisation complete
>
> Group_tool reports this output for clvmd on all four nodes in the
> cluster
>
> dlm 1 clvmd 00010005 FAIL_START_WAIT
> dlm 1 clvmd 00010005 FAIL_ALL_STOPPED
> dlm 1 clvmd 00010005 FAIL_ALL_STOPPED
> dlm 1 clvmd 00000000 JOIN_STOP_WAIT
>
> Otherwise the cluster is OK:
>
> [root at brik3 ~]# clustat
> Cluster Status for mtv_gfs @ Wed Dec 3 14:38:26 2008
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> pan4 4 Online
> pan5 5 Online
> nfs-pan 6 Online
> brik3-gfs 7 Online,
> Local
>
> [root at brik3 ~]# cman_tool status
> Version: 6.1.0
> Config Version: 4
> Cluster Name: mtv_gfs
> Cluster Id: 14067
> Cluster Member: Yes
> Cluster Generation: 172
> Membership state: Cluster-Member
> Nodes: 4
> Expected votes: 4
> Total votes: 4
> Quorum: 3
> Active subsystems: 8
> Flags: Dirty
> Ports Bound: 0 11
> Node name: brik3-gfs
> Node ID: 7
> Multicast addresses: 239.192.54.42
> Node addresses: 172.16.1.60
>
> It seems I have created a deadlock, what is the best way to fix this?
>
> TIA
>
>
The first thing is to check the fencing status, via group_tool and
syslog. If fencing hasn't completed then the DLM can't recover.
--
Chrissie
More information about the Linux-cluster
mailing list