[Linux-cluster] problems with clvmd

Terry td3201 at gmail.com
Mon Apr 18 14:34:59 UTC 2011


On Mon, Apr 18, 2011 at 9:13 AM, Kaloyan Kovachev <kkovachev at varna.net> wrote:
> On Mon, 18 Apr 2011 08:57:34 -0500, Terry <td3201 at gmail.com> wrote:
>> On Mon, Apr 18, 2011 at 8:38 AM, Terry <td3201 at gmail.com> wrote:
>>> On Mon, Apr 18, 2011 at 3:48 AM, Christine Caulfield
>>> <ccaulfie at redhat.com> wrote:
>>>> On 17/04/11 21:52, Terry wrote:
>>>>>
>>>>> As a result of a strange situation where our licensing for storage
>>>>> dropped off, I need to join a centos 5.6 node to a now single node
>>>>> cluster.  I got it joined to the cluster but I am having issues with
>>>>> CLVMD.  Any lvm operations on both boxes hang.  For example, vgscan.
>>>>> I have increased debugging and I don't see any logs.  The VGs aren't
>>>>> being populated in /dev/mapper.  This WAS working right after I
> joined
>>>>> it to the cluster and now it's not for some unknown reason.  Not sure
>>>>> where to take this at this point.   I did find one weird startup log
>>>>> that I am not sure what it means yet:
>>>>> [root at omadvnfs01a ~]# dmesg | grep dlm
>>>>> dlm: no local IP address has been set
>>>>> dlm: cannot start dlm lowcomms -107
>>>>> dlm: Using TCP for communications
>>>>> dlm: connecting to 2
>>>>>
>>>>
>>>>
>>>> That message usually means that dlm_controld has failed to start. Try
>>>> starting the cman daemons (groupd, dlm_controld) manually with the -D
>>>> switch
>>>> and read the output which might give some clues to why it's not
> working.
>>>>
>>>> Chrissie
>>>>
>>>
>>>
>>> Hi Chrissie,
>>>
>>> I thought of that but I see dlm started on both nodes.  See right
> below.
>>>
>>>>> [root at omadvnfs01a ~]# ps xauwwww | grep dlm
>>>>> root      5476  0.0  0.0  24736   760 ?        Ss   15:34   0:00
>>>>> /sbin/dlm_controld
>>>>> root      5502  0.0  0.0      0     0 ?        S<    15:34   0:00
>>>>> [dlm_astd]
>>>>> root      5503  0.0  0.0      0     0 ?        S<    15:34   0:00
>>>>> [dlm_scand]
>>>>> root      5504  0.0  0.0      0     0 ?        S<    15:34   0:00
>>>>> [dlm_recv]
>>>>> root      5505  0.0  0.0      0     0 ?        S<    15:34   0:00
>>>>> [dlm_send]
>>>>> root      5506  0.0  0.0      0     0 ?        S<    15:34   0:00
>>>>> [dlm_recoverd]
>>>>> root      5546  0.0  0.0      0     0 ?        S<    15:35   0:00
>>>>> [dlm_recoverd]
>>>>>
>>>>> [root at omadvnfs01a ~]# lsmod | grep dlm
>>>>> lock_dlm               52065  0
>>>>> gfs2                  529037  1 lock_dlm
>>>>> dlm                   160065  17 lock_dlm
>>>>> configfs               62045  2 dlm
>>>>>
>>>>>
>>>>> centos server:
>>>>> [root at omadvnfs01a ~]# rpm -q cman rgmanager lvm2-cluster
>>>>> cman-2.0.115-68.el5
>>>>> rgmanager-2.0.52-9.el5.centos
>>>>> lvm2-cluster-2.02.74-3.el5_6.1
>>>>>
>>>>> [root at omadvnfs01a ~]# ls /dev/mapper/ | grep -v mpath
>>>>> control
>>>>> VolGroup00-LogVol00
>>>>> VolGroup00-LogVol01
>>>>>
>>>>> rhel server:
>>>>> [root at omadvnfs01b network-scripts]# rpm -q cman rgmanager
> lvm2-cluster
>>>>> cman-2.0.115-34.el5
>>>>> rgmanager-2.0.52-6.el5
>>>>> lvm2-cluster-2.02.56-7.el5_5.4
>>>>>
>>>>> [root at omadvnfs01b network-scripts]# ls /dev/mapper/ | grep -v mpath
>>>>> control
>>>>> vg_data01a-lv_data01a
>>>>> vg_data01b-lv_data01b
>>>>> vg_data01c-lv_data01c
>>>>> vg_data01d-lv_data01d
>>>>> vg_data01e-lv_data01e
>>>>> vg_data01h-lv_data01h
>>>>> vg_data01i-lv_data01i
>>>>> VolGroup00-LogVol00
>>>>> VolGroup00-LogVol01
>>>>> VolGroup02-lv_data00
>>>>>
>>>>> [root at omadvnfs01b network-scripts]# clustat
>>>>> Cluster Status for omadvnfs01 @ Sun Apr 17 15:44:52 2011
>>>>> Member Status: Quorate
>>>>>
>>>>>  Member Name                                                     ID
>>>>> Status
>>>>>  ------ ----                                                     ----
>>>>> ------
>>>>>  omadvnfs01a.sec.jel.lc
> 1
>>>>> Online, rgmanager
>>>>>  omadvnfs01b.sec.jel.lc
> 2
>>>>> Online, Local, rgmanager
>>>>>
>>>>>  Service Name
>>>>> Owner (Last)
> State
>>>>>  ------- ----
>>>>> ----- ------
> -----
>>>>>  service:omadvnfs01-nfs-a
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>  service:omadvnfs01-nfs-b
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>  service:omadvnfs01-nfs-c
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>  service:omadvnfs01-nfs-h
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>  service:omadvnfs01-nfs-i
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>  service:postgresql
>>>>> omadvnfs01b.sec.jel.lc
>>>>> started
>>>>>
>>>>>
>>>>> [root at omadvnfs01a ~]# cman_tool nodes
>>>>> Node  Sts   Inc   Joined               Name
>>>>>    1   M   1892   2011-04-17 15:34:24  omadvnfs01a.sec.jel.lc
>>>>>    2   M   1896   2011-04-17 15:34:24  omadvnfs01b.sec.jel.lc
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>
>>
>>
>> Ok, started all the CMAN elements manually as you suggested.  I
>> started them in order as in the init script. Here's the only error
>> that I see.  I can post the other debug messages if you think they'd
>> be useful but this is the only one that stuck out to me.
>>
>> [root at omadvnfs01a ~]# /sbin/dlm_controld -D
>> 1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
>> 1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
>
> what does "lsmod | egrep -e 'configfs' -e 'dlm'" say?
>
>> 1303134840 set_ccs_options 480
>> 1303134840 cman: node 2 added
>> 1303134840 set_configfs_node 2 10.198.1.111 local 0
>> 1303134840 cman: node 3 added
>> 1303134840 set_configfs_node 3 10.198.1.110 local 1
>>
>> --


[root at omadvnfs01a ~]# lsmod | egrep -e 'configfs' -e 'dlm'
lock_dlm               52065  0
gfs2                  529037  1 lock_dlm
dlm                   160065  5 lock_dlm
configfs               62045  2 dlm

[root at omadvnfs01b log]# lsmod | egrep -e 'configfs' -e 'dlm'
lock_dlm               52065  0
gfs2                  524204  1 lock_dlm
dlm                   160065  19 gfs,lock_dlm
configfs               62045  2 dlm




More information about the Linux-cluster mailing list