[Linux-cluster] cLVM unusable on quorated cluster
Digimer
lists at alteeve.ca
Fri Oct 3 14:38:14 UTC 2014
On 03/10/14 10:35 AM, Daniel Dehennin wrote:
> Hello,
>
> I'm trying to setup pacemaker+corosync on Debian Wheezy to access a SAN
> for an OpenNebula cluster.
>
> As I'm new to cluster world, I have hard time figuring why sometime
> things get really wrong and where I must look to find answers.
>
> My OpenNebula frontend, running in a VM, does not manage to run the
> resources and my syslog has a lot of:
>
> #+begin_src
> ocfs2_controld: Unable to open checkpoint "ocfs2:controld": Object does not exist
> #+end_src
>
> When this happens, other nodes have problem:
>
> #+begin_src
> root at nebula3:~# LANG=C vgscan
> cluster request failed: Host is down
> Unable to obtain global lock.
> #+end_src
>
> But things looks fin in “crm_mon”:
>
> #+begin_src
> root at nebula3:~# crm_mon -1
> ============
> Last updated: Fri Oct 3 16:25:43 2014
> Last change: Fri Oct 3 14:51:59 2014 via cibadmin on nebula1
> Stack: openais
> Current DC: nebula3 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 5 Nodes configured, 5 expected votes
> 32 Resources configured.
> ============
>
> Node quorum: standby
> Online: [ nebula3 nebula2 nebula1 ]
> OFFLINE: [ one ]
>
> Stonith-nebula3-IPMILAN (stonith:external/ipmi): Started nebula2
> Stonith-nebula2-IPMILAN (stonith:external/ipmi): Started nebula3
> Stonith-nebula1-IPMILAN (stonith:external/ipmi): Started nebula2
> Clone Set: ONE-Storage-Clone [ONE-Storage]
> Started: [ nebula1 nebula3 nebula2 ]
> Stopped: [ ONE-Storage:3 ONE-Storage:4 ]
> Quorum-Node (ocf::heartbeat:VirtualDomain): Started nebula3
> Stonith-Quorum-Node (stonith:external/libvirt): Started nebula3
> #+end_src
>
> I don't know how to interpret dlm_tool informations:
>
> #+begin_src
> root at nebula3:~# dlm_tool ls -n
> dlm lockspaces
> name CCB10CE8D4FF489B9A2ECB288DACF2D7
> id 0x09250e49
> flags 0x00000008 fs_reg
> change member 3 joined 1 remove 0 failed 0 seq 2,2
> members 1189587136 1206364352 1223141568
> all nodes
> nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
> nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
> nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
>
> name clvmd
> id 0x4104eefa
> flags 0x00000000
> change member 3 joined 0 remove 1 failed 0 seq 4,4
> members 1189587136 1206364352 1223141568
> all nodes
> nodeid 1172809920 member 0 failed 0 start 0 seq_add 3 seq_rem 4 check none
> nodeid 1189587136 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
> nodeid 1206364352 member 1 failed 0 start 1 seq_add 2 seq_rem 0 check none
> nodeid 1223141568 member 1 failed 0 start 1 seq_add 1 seq_rem 0 check none
> #+end_src
>
>
>
>
> Is there any documentation on troubleshooting DLM/cLVM?
>
> Regards.
Can you paste your full pacemaker config and the logs from the other
nodes starting just before the lost node went away?
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster
mailing list