[Linux-cluster] clvmd hangs

Tue Aug 7 05:22:25 UTC 2012

On 08/07/2012 01:07 AM, Chip Burke wrote:
> I had a node crash (actually, lost power) and now when the cluster comes
> back up none of the PV/VG/Lvs that contain the GFS2 volumes can be
> found. Pvscan, lvscan, vgscan etc all hang.
>
>
> # pvscan -vvvv
> #lvmcmdline.c:1070         Processing: pvscan -vvvv
> #lvmcmdline.c:1073         O_DIRECT will be used
> #libdm-config.c:789       Setting global/locking_type to 3
> #libdm-config.c:789       Setting global/wait_for_locks to 1
> #locking/locking.c:271       Cluster locking selected.
>
> The output is more or less the same from lvscan and vgscan.
>
> The cluster is pretty basic and I was in the midst of configuring
> fencing when this went down, thus the config has no fence in it.
>
> <?xml version="1.0"?>
> <cluster config_version="5" name="Xanadu">
> <clusternodes>
> <clusternode name="xanadunode1" nodeid="1"/>
> <clusternode name="xanadunode2" nodeid="2"/>
> </clusternodes>
> <cman expected_votes="3"/>
> <quorumd label="quorum"/>
> </cluster>
>
> Additionally the cluster logs all show similar unending messages such as:
>
> Aug 07 01:03:12 dlm_controld daemon cpg_join error retrying
> Aug 07 01:03:46 corosync [TOTEM ] Retransmit List: 13
> Aug 07 01:04:04 gfs_controld cpg_mcast_joined retry 31200 protocol
> Aug 07 01:04:12 fenced daemon cpg_join error retrying
>
> Also
>
> # cman_tool status
> Version: 6.2.0
> Config Version: 5
> Cluster Name: Xanadu
> Cluster Id: 10121
> Cluster Member: Yes
> Cluster Generation: 2084
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 3
> Quorum device votes: 1
> Total votes: 3
> Node votes: 1
> Quorum: 2
> Active subsystems: 11
> Flags:
> Ports Bound: 0 11 178
> Node name: xanadunode2
> Node ID: 2
> Multicast addresses: 239.192.39.176
> Node addresses: 192.168.30.66
>
> So cman is up and working. It seems that clvmd and the tools it depends
> on are simply not wanting to play nice. What do I have to do to get
> those volumes to mount?

Without a way to put the lost node into a known state, the only safe 
option remaining is to hang. This is by design. You have to add fencing 
to your cluster.

This explains it in detail;

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

-- 
Digimer
Papers and Projects: https://alteeve.com