[linux-lvm] cluster request failed: Host is down

Fri Nov 16 12:48:09 UTC 2012

Hi,

I have seen this problem already reported here, but with no useful
answer:

http://osdir.com/ml/linux-lvm/2011-01/msg00038.html

This post suggest it is some very old bug, a change which can be easily
reverted… though, it is a bit hard to believe. Such an easy bug, would
be already fixed, wouldn't it?

For me the problem is as follows:

I have a two node cluster with a volume group running on a DRBD in
Master-Master setup. When I shut one node down, cleanly, I am not able
to properly manage the volumes. 

LVs which are active on the surviving host remain active, but I am not
able to deactivate them or activate more volumes:

>  [root at dev1n1 ~]# lvs dev1_vg/4bwM2m7oVL
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    4bwM2m7oVL dev1_vg -wi------ 1.00g                                           
>  [root at dev1n1 ~]# lvchange -aey dev1_vg/XaMS0LyAq8 ; echo $?
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>  5
>  [root at dev1n1 ~]# lvs dev1_vg/4bwM2m7oVL
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    4bwM2m7oVL dev1_vg -wi------ 1.00g                                           
>  [root at dev1n1 ~]# lvchange -aen dev1_vg/XaMS0LyAq8 ; echo $?
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>  5
>  [root at dev1n1 ~]# lvs dev1_vg/XaMS0LyAq8
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    XaMS0LyAq8 dev1_vg -wi-a---- 1.00g                                           
>  
>  [root at dev1n1 ~]# dlm_tool ls
>  dlm lockspaces
>  name          clvmd
>  id            0x4104eefa
>  flags         0x00000000 
>  change        member 1 joined 0 remove 1 failed 0 seq 2,2
>  members       1 
>  
>  [root at dev1n1 ~]# dlm_tool status
>  cluster nodeid 1 quorate 1 ring seq 30648 30648
>  daemon now 1115 fence_pid 0 
>  node 1 M add 15 rem 0 fail 0 fence 0 at 0 0
>  node 2 X add 15 rem 184 fail 0 fence 0 at 0 0

The node has cleanly left the lockspace and the cluster. DLM is aware
about that, so should be clvmd, right? And if all other cluster nodes
(only one here) are clean, all LVM operations on the clustered VG should
work, right? Or am I missing something?

The behaviour is exactly the same when I power off a running node. It
is fenced by dlm_tool, as expected and then the VG is non-functional as
above, until the dead node is up again and joins the cluster.

Is this the expected behaviour or is it a bug?

Greets,
        Jacek