[Linux-cluster] clvmd problems with centos 6.3 or normal clvmd behaviour?

Wed Aug 1 14:26:38 UTC 2012

Hello GianLuca

Why you don't remove expected_votes=3 and let the cluster automatic
calculate that

I told you be cause i had some many problems with that setting

2012/8/1 Gianluca Cecchi <gianluca.cecchi at gmail.com>

> Hello,
> testing a three node cluster + quorum disk and clvmd.
> I was at CentOS 6.2 and I seem to remember to be able to start a
> single node. Correct?
> Then I upgraded to CentOS 6.3 and had a working environment.
> My config has
> <cman expected_votes="3" quorum_dev_poll="240000" two_node="0"/>
>
> At the moment two nodes are in another site that is powered down and I
> need to start a single node config.
>
> When the node starts it gets waiting for quorum and when quorum disk
> becomes master it goes ahead:
>
> # cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    0   M      0   2012-08-01 15:41:58  /dev/block/253:4
>    1   X      0                        intrarhev1
>    2   X      0                        intrarhev2
>    3   M   1420   2012-08-01 15:39:58  intrarhev3
>
> But the process hangs at clvmd start up. In particular at the step
> vgchange -aly
> Pid of "service clvmd start" command is 9335
>
> # pstree -alp 9335
> S24clvmd,9335 /etc/rc3.d/S24clvmd start
>   └─vgchange,9363 -ayl
>
>
> # ll /proc/9363/fd/
> total 0
> lrwx------ 1 root root 64 Aug  1 15:44 0 -> /dev/console
> lrwx------ 1 root root 64 Aug  1 15:44 1 -> /dev/console
> lrwx------ 1 root root 64 Aug  1 15:44 2 -> /dev/console
> lrwx------ 1 root root 64 Aug  1 15:44 3 -> /dev/mapper/control
> lrwx------ 1 root root 64 Aug  1 15:44 4 -> socket:[1348167]
> lr-x------ 1 root root 64 Aug  1 15:44 5 -> /dev/dm-3
>
> # lsof -p 9363
> COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
> vgchange 9363 root  cwd    DIR              104,3     4096       2 /
> vgchange 9363 root  rtd    DIR              104,3     4096       2 /
> vgchange 9363 root  txt    REG              104,3   971464  132238
> /sbin/lvm
> vgchange 9363 root  mem    REG              104,3   156872     210
> /lib64/ld-2.12.so
> vgchange 9363 root  mem    REG              104,3  1918016     569
> /lib64/libc-2.12.so
> vgchange 9363 root  mem    REG              104,3    22536     593
> /lib64/libdl-2.12.so
> vgchange 9363 root  mem    REG              104,3    24000     832
> /lib64/libdevmapper-event.so.1.02
> vgchange 9363 root  mem    REG              104,3   124624     750
> /lib64/libselinux.so.1
> vgchange 9363 root  mem    REG              104,3   272008    2060
> /lib64/libreadline.so.6.0
> vgchange 9363 root  mem    REG              104,3   138280    2469
> /lib64/libtinfo.so.5.7
> vgchange 9363 root  mem    REG              104,3    61648    1694
> /lib64/libudev.so.0.5.1
> vgchange 9363 root  mem    REG              104,3   251112    1489
> /lib64/libsepol.so.1
> vgchange 9363 root  mem    REG              104,3   229024    1726
> /lib64/libdevmapper.so.1.02
> vgchange 9363 root  mem    REG              253,7 99158576   17029
> /usr/lib/locale/locale-archive
> vgchange 9363 root  mem    REG              253,7    26060  134467
> /usr/lib64/gconv/gconv-modules.cache
> vgchange 9363 root    0u   CHR                5,1      0t0    5218
> /dev/console
> vgchange 9363 root    1u   CHR                5,1      0t0    5218
> /dev/console
> vgchange 9363 root    2u   CHR                5,1      0t0    5218
> /dev/console
> vgchange 9363 root    3u   CHR              10,58      0t0    5486
> /dev/mapper/control
> vgchange 9363 root    4u  unix 0xffff880879b309c0      0t0 1348167 socket
> vgchange 9363 root    5r   BLK              253,3 0t143360   10773
> /dev/dm-3
>
>
> # strace -p 9363
> Process 9363 attached - interrupt to quit
> read(4,
>
> multipath seems ok in general and for md=3 in particular
> # multipath -l /dev/mapper/mpathd
> mpathd (3600507630efe0b0c0000000000001181) dm-3 IBM,1750500
> size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
> |-+- policy='round-robin 0' prio=0 status=active
> | |- 0:0:0:3 sdd        8:48   active undef running
> | `- 1:0:0:3 sdl        8:176  active undef running
> `-+- policy='round-robin 0' prio=0 status=enabled
>   |- 0:0:1:3 sdq        65:0   active undef running
>   `- 1:0:1:3 sdy        65:128 active undef running
>
> Currently I have
> lvm2-2.02.95-10.el6.x86_64
> lvm2-cluster-2.02.95-10.el6.x86_64
>
> startup is stuck as in image attached
>
> Logs
> messages:
> Aug  1 15:46:14 udevd[663]: worker [9379] unexpectedly returned with
> status 0x0100
> Aug  1 15:46:14 udevd[663]: worker [9379] failed while handling
> '/devices/virtual/block/dm-15'
>
> dmesg
> DLM (built Jul 20 2012 01:56:50) installed
> dlm: Using TCP for communications
>
>
> qdiskd
> Aug 01 15:41:58 qdiskd Score sufficient for master operation (1/1;
> required=1); upgrading
> Aug 01 15:43:03 qdiskd Assuming master role
>
> corosync.log
> Aug 01 15:41:58 corosync [CMAN  ] quorum device registered
> Aug 01 15:43:08 corosync [CMAN  ] quorum regained, resuming activity
> Aug 01 15:43:08 corosync [QUORUM] This node is within the primary
> component and will provide service.
> Aug 01 15:43:08 corosync [QUORUM] Members[1]: 3
>
> fenced.log
> Aug 01 15:43:09 fenced fenced 3.0.12.1 started
> Aug 01 15:43:09 fenced failed to get dbus connection
>
> dlm_controld.log
> Aug 01 15:43:10 dlm_controld dlm_controld 3.0.12.1 started
>
> gfs_controld.log
> Aug 01 15:43:11 gfs_controld gfs_controld 3.0.12.1 started
>
>
> Do I miss anything simple?
> Is it correct to say that clvmd can start only when one node is
> active, given that it has quorum under the cluster configuration rules
> set up?
>
> Or am I hitting any known bug/problem?
>
> Thanks in advance,
> Gianluca
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120801/1be08eda/attachment.htm>