[Linux-cluster] problems with clvmd and lvms on rhel6.1

Fri Aug 10 16:38:35 UTC 2012

This is the cluster conf, Which is a clone of the problematic system on
a test environment (without the ORacle and SAP instances, only focusing
on this LVM issue, with an LVM resource)

[root at rhel2 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="teszt">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="rhel1.local" nodeid="1" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="rhel2.local" nodeid="2" votes="1">
			<fence/>
		</clusternode>
	</clusternodes>
	<cman expected_votes="3"/>
	<fencedevices/>
	<rm>
		<failoverdomains>
			<failoverdomain name="all" nofailback="1" ordered="1" restricted="0">
				<failoverdomainnode name="rhel1.local" priority="1"/>
				<failoverdomainnode name="rhel2.local" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<lvm lv_name="teszt-lv" name="teszt-lv" vg_name="teszt"/>
			<fs device="/dev/teszt/teszt-lv" fsid="43679" fstype="ext4"
mountpoint="/lvm" name="teszt-fs"/>
		</resources>
		<service autostart="1" domain="all" exclusive="0" name="teszt"
recovery="disable">
			<lvm ref="teszt-lv"/>
			<fs ref="teszt-fs"/>
		</service>
	</rm>
	<quorumd label="qdisk"/>
</cluster>

Here are the log parts:
Aug 10 17:21:21 rgmanager I am node #2
Aug 10 17:21:22 rgmanager Resource Group Manager Starting
Aug 10 17:21:22 rgmanager Loading Service Data
Aug 10 17:21:29 rgmanager Initializing Services
Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted
Aug 10 17:21:31 rgmanager Services Initialized
Aug 10 17:21:31 rgmanager State change: Local UP
Aug 10 17:21:31 rgmanager State change: rhel1.local UP
Aug 10 17:23:23 rgmanager Starting stopped service service:teszt
Aug 10 17:23:25 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:23:29 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:23:29 rgmanager Stopping service service:teszt
Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))
Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop;
intervention required
Aug 10 17:23:31 rgmanager Service service:teszt is failed
Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not
start.
Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop cleanly
Aug 10 17:25:12 rgmanager Starting stopped service service:teszt
Aug 10 17:25:14 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:25:17 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:25:18 rgmanager Stopping service service:teszt
Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))

After I manually started the lvm on node1 and tried to switch it on
node2 it's not able to start it.

Regards,
Krisztian

On 08/10/2012 05:15 PM, Digimer wrote:
> On 08/10/2012 11:07 AM, Poós Krisztián wrote:
>> Dear all,
>>
>> I hope that anyone run into this problem in the past, so maybe can help
>> me resolving this issue.
>>
>> There is a 2 node rhel cluster with quorum also.
>> There are clustered lvms, where the -c- flag is on.
>> If I start clvmd all the clustered lvms became online.
>>
>> After this if I start rgmanager, it deactivates all the volumes, and not
>> able to activate them anymore as there are no such devices anymore
>> during the startup of the service, so after this, the service fails.
>> All lvs remain without the active flag.
>>
>> I can manually bring it up, but only if after clvmd is started, I set
>> the lvms manually offline by the lvchange -an <lv>
>> After this, when I start rgmanager, it can take it online without
>> problems. However I think, this action should be done by the rgmanager
>> itself. All the logs is full with the next:
>> rgmanager Making resilient: lvchange -an ....
>> rgmanager lv_exec_resilient failed
>> rgmanager lv_activate_resilient stop failed on ....
>>
>> As well, sometimes the lvs/clvmd commands are also hanging. I have to
>> restart clvmd to make it work again. (sometimes killing it)
>>
>> Anyone has any idea, what to check?
>>
>> Thanks and regards,
>> Krisztian
> 
> Please paste your cluster.conf file with minimal edits.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4925 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120810/9b69cc18/attachment.p7s>