[Linux-cluster] Clvm Hang after an node is fenced in a 2 nodes cluster

Mon Jun 8 02:40:38 UTC 2009

Do you have all of your cluster services chkconfig'd on at node2 ? Sounds to
me like clvmd might be chkconfig'd off

On Thu, Jun 4, 2009 at 2:54 AM, Jean Diallo <
admin1-bua.dage-etd at justice.gouv.fr> wrote:

> Description of problem: In a 2 nodes cluster, after 1 node is fence, any
> clvm command hang on the ramaining node. when the fenced node cluster come
> back in the cluster, any clvm command also hang, moreover the node do not
> activate any clustered vg, and so do not access any shared device.
>
>
> Version-Release number of selected component (if applicable):
> redhat 5.2
> update device-mapper-1.02.28-2.el5.x86_64.rpm
>      lvm2-2.02.40-6.el5.x86_64.rpm
>      lvm2-cluster-2.02.40-7.el5.x86_64.rpm
>
>
> Steps to Reproduce:
> 1.2 nodes cluster , quorum formed with qdisk
> 2.cold boot node 2
> 3.node 2 is evicted and fenced, service are taken over by node 1
> 4.node é come back in cluster, quorate, but no clustered vg are up and any
> lvm  related command hang
> 5.At this step every lvm command hang on node 1
>
>
> Expected results: node 2 should be able to get back the lock on clustered
> lvm volume and node 1 should be able to issue any lvm relate command
>
> Here are my cluster.conf and lvm.conf
> <?xml version="1.0"?>
> <cluster alias="rome" config_version="53" name="rome">
>       <fence_daemon clean_start="0" post_fail_delay="9"
> post_join_delay="6"/>
>       <clusternodes>
>               <clusternode name="romulus.fr" nodeid="1" votes="1">
>                       <fence>
>                               <method name="1">
>                                       <device name="ilo172"/>
>                               </method>
>                       </fence>
>               </clusternode>
>               <clusternode name="remus.fr" nodeid="2" votes="1">
>                       <fence>
>                               <method name="1">
>                                       <device name="ilo173"/>
>                               </method>
>                       </fence>
>               </clusternode>
>       </clusternodes>
>       <cman expected_votes="3"/>
>       <totem consensus="4800" join="60" token="21002"
> token_retransmits_before_loss_const="20"/>
>       <fencedevices>
>               <fencedevice agent="fence_ilo" hostname="X.X.X.X"
> login="Administrator" name="ilo172" passwd="X.X.X.X"/>
>               <fencedevice agent="fence_ilo" hostname="XXXX"
> login="Administrator" name="ilo173" passwd="XXXX"/>
>       </fencedevices>
>       <rm>
>               <failoverdomains/>
>               <resources/>
>               <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoP64" path="/etc/xen" recovery="relocate"/>
>               <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoI64" path="/etc/xen" recovery="relocate"/>
>               <vm autostart="1" exclusive="0" migrate="live"
> name="alfrescoS64" path="/etc/xen" recovery="relocate"/>
>       </rm>
>       <quorumd interval="3" label="quorum64" min_score="1" tko="30"
> votes="1">
>               <heuristic interval="2" program="ping -c3 -t2 X.X.X.X"
> score="1"/>
>       </quorumd>
> </cluster>
>
> part of lvm.conf:
> # Type 3 uses built-in clustered locking.
>   locking_type = 3
>
>   # If using external locking (type 2) and initialisation fails,
>   # with this set to 1 an attempt will be made to use the built-in
>   # clustered locking.
>   # If you are using a customised locking_library you should set this to 0.
>   fallback_to_clustered_locking = 0
>
>   # If an attempt to initialise type 2 or type 3 locking failed, perhaps
>   # because cluster components such as clvmd are not running, with this set
>   # to 1 an attempt will be made to use local file-based locking (type 1).
>   # If this succeeds, only commands against local volume groups will
> proceed.
>   # Volume Groups marked as clustered will be ignored.
>   fallback_to_local_locking = 1
>
>   # Local non-LV directory that holds file-based locks while commands are
>   # in progress.  A directory like /tmp that may get wiped on reboot is OK.
>   locking_dir = "/var/lock/lvm"
>
>   # Other entries can go here to allow you to load shared libraries
>   # e.g. if support for LVM1 metadata was compiled as a shared library use
>   #   format_libraries = "liblvm2format1.so"
>   # Full pathnames can be given.
>
>   # Search this directory first for shared libraries.
>   #   library_dir = "/lib"
>
>   # The external locking library to load if locking_type is set to 2.
>   #   locking_library = "liblvm2clusterlock.so"
>
>
> part of lvm log on second node :
>
> vgchange.c:165   Activated logical volumes in volume group "VolGroup00"
> vgchange.c:172   7 logical volume(s) in volume group "VolGroup00" now
> active
> cache/lvmcache.c:1220   Wiping internal VG cache
> commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:17:29
> 2009
> commands/toolcontext.c:209   Set umask to 0077
> locking/cluster_locking.c:83   connect() failed on local socket: Connexion
> refusée
> locking/locking.c:259   WARNING: Falling back to local file-based locking.
> locking/locking.c:261   Volume Groups with the clustered attribute will be
> inaccessible.
> toollib.c:578   Finding all volume groups
> toollib.c:491   Finding volume group "VGhomealfrescoS64"
> metadata/metadata.c:2379   Skipping clustered volume group
> VGhomealfrescoS64
> toollib.c:491   Finding volume group "VGhomealfS64"
> metadata/metadata.c:2379   Skipping clustered volume group VGhomealfS64
> toollib.c:491   Finding volume group "VGvmalfrescoS64"
> metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoS64
> toollib.c:491   Finding volume group "VGvmalfrescoI64"
> metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoI64
> toollib.c:491   Finding volume group "VGvmalfrescoP64"
> metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoP64
> toollib.c:491   Finding volume group "VolGroup00"
> libdm-report.c:981   VolGroup00
> cache/lvmcache.c:1220   Wiping internal VG cache
> commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:17:29
> 2009
> commands/toolcontext.c:209   Set umask to 0077
> locking/cluster_locking.c:83   connect() failed on local socket: Connexion
> refusée
> locking/locking.c:259   WARNING: Falling back to local file-based locking.
> locking/locking.c:261   Volume Groups with the clustered attribute will be
> inaccessible.
> toollib.c:542   Using volume group(s) on command line
> toollib.c:491   Finding volume group "VolGroup00"
> vgchange.c:117   7 logical volume(s) in volume group "VolGroup00" monitored
> cache/lvmcache.c:1220   Wiping internal VG cache
> commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:20:45
> 2009
> commands/toolcontext.c:209   Set umask to 0077
> toollib.c:331   Finding all logical volumes
> commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:20:50
> 2009
> commands/toolcontext.c:209   Set umask to 0077
> toollib.c:578   Finding all volume groups
>
>
> group_tool on node 1
> type             level name       id       state       fence            0
>   default    00010001 none        [1 2]
> dlm              1     clvmd      00010002 none        [1 2]
> dlm              1     rgmanager  00020002 none        [1]
>
>
> group_tool on node 2
> [root at remus ~]# group_tool
> type             level name     id       state       fence            0
> default  00010001 none        [1 2]
> dlm              1     clvmd    00010002 none        [1 2]
>
> Additional info:
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090607/8605d017/attachment.htm>