[Linux-cluster] problems with clvmd
Terry
td3201 at gmail.com
Mon Apr 18 13:57:34 UTC 2011
On Mon, Apr 18, 2011 at 8:38 AM, Terry <td3201 at gmail.com> wrote:
> On Mon, Apr 18, 2011 at 3:48 AM, Christine Caulfield
> <ccaulfie at redhat.com> wrote:
>> On 17/04/11 21:52, Terry wrote:
>>>
>>> As a result of a strange situation where our licensing for storage
>>> dropped off, I need to join a centos 5.6 node to a now single node
>>> cluster. I got it joined to the cluster but I am having issues with
>>> CLVMD. Any lvm operations on both boxes hang. For example, vgscan.
>>> I have increased debugging and I don't see any logs. The VGs aren't
>>> being populated in /dev/mapper. This WAS working right after I joined
>>> it to the cluster and now it's not for some unknown reason. Not sure
>>> where to take this at this point. I did find one weird startup log
>>> that I am not sure what it means yet:
>>> [root at omadvnfs01a ~]# dmesg | grep dlm
>>> dlm: no local IP address has been set
>>> dlm: cannot start dlm lowcomms -107
>>> dlm: Using TCP for communications
>>> dlm: connecting to 2
>>>
>>
>>
>> That message usually means that dlm_controld has failed to start. Try
>> starting the cman daemons (groupd, dlm_controld) manually with the -D switch
>> and read the output which might give some clues to why it's not working.
>>
>> Chrissie
>>
>
>
> Hi Chrissie,
>
> I thought of that but I see dlm started on both nodes. See right below.
>
>>> [root at omadvnfs01a ~]# ps xauwwww | grep dlm
>>> root 5476 0.0 0.0 24736 760 ? Ss 15:34 0:00
>>> /sbin/dlm_controld
>>> root 5502 0.0 0.0 0 0 ? S< 15:34 0:00
>>> [dlm_astd]
>>> root 5503 0.0 0.0 0 0 ? S< 15:34 0:00
>>> [dlm_scand]
>>> root 5504 0.0 0.0 0 0 ? S< 15:34 0:00
>>> [dlm_recv]
>>> root 5505 0.0 0.0 0 0 ? S< 15:34 0:00
>>> [dlm_send]
>>> root 5506 0.0 0.0 0 0 ? S< 15:34 0:00
>>> [dlm_recoverd]
>>> root 5546 0.0 0.0 0 0 ? S< 15:35 0:00
>>> [dlm_recoverd]
>>>
>>> [root at omadvnfs01a ~]# lsmod | grep dlm
>>> lock_dlm 52065 0
>>> gfs2 529037 1 lock_dlm
>>> dlm 160065 17 lock_dlm
>>> configfs 62045 2 dlm
>>>
>>>
>>> centos server:
>>> [root at omadvnfs01a ~]# rpm -q cman rgmanager lvm2-cluster
>>> cman-2.0.115-68.el5
>>> rgmanager-2.0.52-9.el5.centos
>>> lvm2-cluster-2.02.74-3.el5_6.1
>>>
>>> [root at omadvnfs01a ~]# ls /dev/mapper/ | grep -v mpath
>>> control
>>> VolGroup00-LogVol00
>>> VolGroup00-LogVol01
>>>
>>> rhel server:
>>> [root at omadvnfs01b network-scripts]# rpm -q cman rgmanager lvm2-cluster
>>> cman-2.0.115-34.el5
>>> rgmanager-2.0.52-6.el5
>>> lvm2-cluster-2.02.56-7.el5_5.4
>>>
>>> [root at omadvnfs01b network-scripts]# ls /dev/mapper/ | grep -v mpath
>>> control
>>> vg_data01a-lv_data01a
>>> vg_data01b-lv_data01b
>>> vg_data01c-lv_data01c
>>> vg_data01d-lv_data01d
>>> vg_data01e-lv_data01e
>>> vg_data01h-lv_data01h
>>> vg_data01i-lv_data01i
>>> VolGroup00-LogVol00
>>> VolGroup00-LogVol01
>>> VolGroup02-lv_data00
>>>
>>> [root at omadvnfs01b network-scripts]# clustat
>>> Cluster Status for omadvnfs01 @ Sun Apr 17 15:44:52 2011
>>> Member Status: Quorate
>>>
>>> Member Name ID
>>> Status
>>> ------ ---- ----
>>> ------
>>> omadvnfs01a.sec.jel.lc 1
>>> Online, rgmanager
>>> omadvnfs01b.sec.jel.lc 2
>>> Online, Local, rgmanager
>>>
>>> Service Name
>>> Owner (Last) State
>>> ------- ----
>>> ----- ------ -----
>>> service:omadvnfs01-nfs-a
>>> omadvnfs01b.sec.jel.lc
>>> started
>>> service:omadvnfs01-nfs-b
>>> omadvnfs01b.sec.jel.lc
>>> started
>>> service:omadvnfs01-nfs-c
>>> omadvnfs01b.sec.jel.lc
>>> started
>>> service:omadvnfs01-nfs-h
>>> omadvnfs01b.sec.jel.lc
>>> started
>>> service:omadvnfs01-nfs-i
>>> omadvnfs01b.sec.jel.lc
>>> started
>>> service:postgresql
>>> omadvnfs01b.sec.jel.lc
>>> started
>>>
>>>
>>> [root at omadvnfs01a ~]# cman_tool nodes
>>> Node Sts Inc Joined Name
>>> 1 M 1892 2011-04-17 15:34:24 omadvnfs01a.sec.jel.lc
>>> 2 M 1896 2011-04-17 15:34:24 omadvnfs01b.sec.jel.lc
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
Ok, started all the CMAN elements manually as you suggested. I
started them in order as in the init script. Here's the only error
that I see. I can post the other debug messages if you think they'd
be useful but this is the only one that stuck out to me.
[root at omadvnfs01a ~]# /sbin/dlm_controld -D
1303134840 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1303134840 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1303134840 set_ccs_options 480
1303134840 cman: node 2 added
1303134840 set_configfs_node 2 10.198.1.111 local 0
1303134840 cman: node 3 added
1303134840 set_configfs_node 3 10.198.1.110 local 1
More information about the Linux-cluster
mailing list