[Linux-cluster] Clvm Hang after an node is fenced in a 2 nodes cluster
Darcy Sherwood
darcy.sherwood at gmail.com
Mon Jun 8 02:40:38 UTC 2009
Do you have all of your cluster services chkconfig'd on at node2 ? Sounds to
me like clvmd might be chkconfig'd off
On Thu, Jun 4, 2009 at 2:54 AM, Jean Diallo <
admin1-bua.dage-etd at justice.gouv.fr> wrote:
> Description of problem: In a 2 nodes cluster, after 1 node is fence, any
> clvm command hang on the ramaining node. when the fenced node cluster come
> back in the cluster, any clvm command also hang, moreover the node do not
> activate any clustered vg, and so do not access any shared device.
>
>
> Version-Release number of selected component (if applicable):
> redhat 5.2
> update device-mapper-1.02.28-2.el5.x86_64.rpm
> lvm2-2.02.40-6.el5.x86_64.rpm
> lvm2-cluster-2.02.40-7.el5.x86_64.rpm
>
>
> Steps to Reproduce:
> 1.2 nodes cluster , quorum formed with qdisk
> 2.cold boot node 2
> 3.node 2 is evicted and fenced, service are taken over by node 1
> 4.node é come back in cluster, quorate, but no clustered vg are up and any
> lvm related command hang
> 5.At this step every lvm command hang on node 1
>
>
> Expected results: node 2 should be able to get back the lock on clustered
> lvm volume and node 1 should be able to issue any lvm relate command
>
> Here are my cluster.conf and lvm.conf
> <?xml version="1.0"?>
> <cluster alias="rome" config_version="53" name="rome">
> <fence_daemon clean_start="0" post_fail_delay="9"
> post_join_delay="6"/>
> <clusternodes>
> <clusternode name="romulus.fr" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device name="ilo172"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="remus.fr" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device name="ilo173"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="3"/>
> <totem consensus="4800" join="60" token="21002"
> token_retransmits_before_loss_const="20"/>
> <fencedevices>
> <fencedevice agent="fence_ilo" hostname="X.X.X.X"
> login="Administrator" name="ilo172" passwd="X.X.X.X"/>
> <fencedevice agent="fence_ilo" hostname="XXXX"
> login="Administrator" name="ilo173" passwd="XXXX"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources/>
> <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoP64" path="/etc/xen" recovery="relocate"/>
> <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoI64" path="/etc/xen" recovery="relocate"/>
> <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoS64" path="/etc/xen" recovery="relocate"/>
> </rm>
> <quorumd interval="3" label="quorum64" min_score="1" tko="30"
> votes="1">
> <heuristic interval="2" program="ping -c3 -t2 X.X.X.X"
> score="1"/>
> </quorumd>
> </cluster>
>
> part of lvm.conf:
> # Type 3 uses built-in clustered locking.
> locking_type = 3
>
> # If using external locking (type 2) and initialisation fails,
> # with this set to 1 an attempt will be made to use the built-in
> # clustered locking.
> # If you are using a customised locking_library you should set this to 0.
> fallback_to_clustered_locking = 0
>
> # If an attempt to initialise type 2 or type 3 locking failed, perhaps
> # because cluster components such as clvmd are not running, with this set
> # to 1 an attempt will be made to use local file-based locking (type 1).
> # If this succeeds, only commands against local volume groups will
> proceed.
> # Volume Groups marked as clustered will be ignored.
> fallback_to_local_locking = 1
>
> # Local non-LV directory that holds file-based locks while commands are
> # in progress. A directory like /tmp that may get wiped on reboot is OK.
> locking_dir = "/var/lock/lvm"
>
> # Other entries can go here to allow you to load shared libraries
> # e.g. if support for LVM1 metadata was compiled as a shared library use
> # format_libraries = "liblvm2format1.so"
> # Full pathnames can be given.
>
> # Search this directory first for shared libraries.
> # library_dir = "/lib"
>
> # The external locking library to load if locking_type is set to 2.
> # locking_library = "liblvm2clusterlock.so"
>
>
> part of lvm log on second node :
>
> vgchange.c:165 Activated logical volumes in volume group "VolGroup00"
> vgchange.c:172 7 logical volume(s) in volume group "VolGroup00" now
> active
> cache/lvmcache.c:1220 Wiping internal VG cache
> commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29
> 2009
> commands/toolcontext.c:209 Set umask to 0077
> locking/cluster_locking.c:83 connect() failed on local socket: Connexion
> refusée
> locking/locking.c:259 WARNING: Falling back to local file-based locking.
> locking/locking.c:261 Volume Groups with the clustered attribute will be
> inaccessible.
> toollib.c:578 Finding all volume groups
> toollib.c:491 Finding volume group "VGhomealfrescoS64"
> metadata/metadata.c:2379 Skipping clustered volume group
> VGhomealfrescoS64
> toollib.c:491 Finding volume group "VGhomealfS64"
> metadata/metadata.c:2379 Skipping clustered volume group VGhomealfS64
> toollib.c:491 Finding volume group "VGvmalfrescoS64"
> metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoS64
> toollib.c:491 Finding volume group "VGvmalfrescoI64"
> metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoI64
> toollib.c:491 Finding volume group "VGvmalfrescoP64"
> metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoP64
> toollib.c:491 Finding volume group "VolGroup00"
> libdm-report.c:981 VolGroup00
> cache/lvmcache.c:1220 Wiping internal VG cache
> commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29
> 2009
> commands/toolcontext.c:209 Set umask to 0077
> locking/cluster_locking.c:83 connect() failed on local socket: Connexion
> refusée
> locking/locking.c:259 WARNING: Falling back to local file-based locking.
> locking/locking.c:261 Volume Groups with the clustered attribute will be
> inaccessible.
> toollib.c:542 Using volume group(s) on command line
> toollib.c:491 Finding volume group "VolGroup00"
> vgchange.c:117 7 logical volume(s) in volume group "VolGroup00" monitored
> cache/lvmcache.c:1220 Wiping internal VG cache
> commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:45
> 2009
> commands/toolcontext.c:209 Set umask to 0077
> toollib.c:331 Finding all logical volumes
> commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:50
> 2009
> commands/toolcontext.c:209 Set umask to 0077
> toollib.c:578 Finding all volume groups
>
>
> group_tool on node 1
> type level name id state fence 0
> default 00010001 none [1 2]
> dlm 1 clvmd 00010002 none [1 2]
> dlm 1 rgmanager 00020002 none [1]
>
>
> group_tool on node 2
> [root at remus ~]# group_tool
> type level name id state fence 0
> default 00010001 none [1 2]
> dlm 1 clvmd 00010002 none [1 2]
>
> Additional info:
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090607/8605d017/attachment.htm>
More information about the Linux-cluster
mailing list