[Linux-cluster] cLVM: LVM commands take severl minutes to complete

Fri Sep 11 15:28:22 UTC 2015

11.09.2015 17:02, Daniel Dehennin wrote:
> Hello,
>
> On a two node cluster Ubuntu Trusty:
>
> - Linux nebula3 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59
>    UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> - corosync 2.3.3-1ubuntu1
>
> - pacemaker 1.1.10+git20130802-1ubuntu2.3
>
> - dlm 4.0.1-0ubuntu1
>
> - clvm 2.02.98-6ubuntu2

You need newer version of this^

2.02.102 is known to include commit 431eda6 without which cluster is 
unusable in degraded state (and even if one node is put to standby state).

You see timeouts with two nodes online, so that is the different issue, 
but that above will not hurt.

>
> - gfs2-utils 3.1.6-0ubuntu1
>
>
> The LVM commands take minutes to complete:
>
>      root at nebula3:~# time vgs
>        Error locking on node 40a8e784: Command timed out
>        Error locking on node 40a8e784: Command timed out
>        Error locking on node 40a8e784: Command timed out
>        VG             #PV #LV #SN Attr   VSize    VFree
>        nebula3-vg       1   4   0 wz--n-  133,52g       0
>        one-fs           1   1   0 wz--nc    2,00t       0
>        one-production   1   0   0 wz--nc 1023,50g 1023,50g
>
>      real    5m40.233s
>      user    0m0.005s
>      sys     0m0.018s
>
> Do you know where I can look to find what's going on?
>
> Here are some informations:
>
>      root at nebula3:~# corosync-quorumtool
>      Quorum information
>      ------------------
>      Date:             Fri Sep 11 15:57:17 2015
>      Quorum provider:  corosync_votequorum
>      Nodes:            2
>      Node ID:          1084811139
>      Ring ID:          1460
>      Quorate:          Yes
>
>      Votequorum information
>      ----------------------
>      Expected votes:   2
>      Highest expected: 2
>      Total votes:      2
>      Quorum:           1
>      Flags:            2Node Quorate WaitForAll LastManStanding

Better use two_node: 1 in votequorum section.
That implies wait_for_all and supersedes last_man_standing for two-node 
clusters.

I'd also recommend to set clear_node_high_bit in totem section, do you 
use it?

But even better is to add nodelist section to corosync.conf with 
manually specified nodeid's.

Everything else looks fine...

>
>      Membership information
>      ----------------------
>          Nodeid      Votes Name
>      1084811139          1 192.168.231.131 (local)
>      1084811140          1 192.168.231.132
>
>
>      root at nebula3:~# dlm_tool ls
>      dlm lockspaces
>      name          datastores
>      id            0x1b61ba6a
>      flags         0x00000000
>      change        member 2 joined 1 remove 0 failed 0 seq 1,1
>      members       1084811139 1084811140
>
>      name          clvmd
>      id            0x4104eefa
>      flags         0x00000000
>      change        member 2 joined 1 remove 0 failed 0 seq 1,1
>      members       1084811139 1084811140
>
>
>      root at nebula3:~# dlm_tool status
>      cluster nodeid 1084811139 quorate 1 ring seq 1460 1460
>      daemon now 11026 fence_pid 0
>      node 1084811139 M add 455 rem 0 fail 0 fence 0 at 0 0
>      node 1084811140 M add 455 rem 0 fail 0 fence 0 at 0 0
>
>
> Regards.
>
>
>