[Linux-cluster] problems with clvmd and lvms on rhel6.1
Poós Krisztián
krisztian at poos.hu
Fri Aug 10 16:38:35 UTC 2012
This is the cluster conf, Which is a clone of the problematic system on
a test environment (without the ORacle and SAP instances, only focusing
on this LVM issue, with an LVM resource)
[root at rhel2 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="teszt">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="rhel1.local" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="rhel2.local" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="3"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="all" nofailback="1" ordered="1" restricted="0">
<failoverdomainnode name="rhel1.local" priority="1"/>
<failoverdomainnode name="rhel2.local" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<lvm lv_name="teszt-lv" name="teszt-lv" vg_name="teszt"/>
<fs device="/dev/teszt/teszt-lv" fsid="43679" fstype="ext4"
mountpoint="/lvm" name="teszt-fs"/>
</resources>
<service autostart="1" domain="all" exclusive="0" name="teszt"
recovery="disable">
<lvm ref="teszt-lv"/>
<fs ref="teszt-fs"/>
</service>
</rm>
<quorumd label="qdisk"/>
</cluster>
Here are the log parts:
Aug 10 17:21:21 rgmanager I am node #2
Aug 10 17:21:22 rgmanager Resource Group Manager Starting
Aug 10 17:21:22 rgmanager Loading Service Data
Aug 10 17:21:29 rgmanager Initializing Services
Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted
Aug 10 17:21:31 rgmanager Services Initialized
Aug 10 17:21:31 rgmanager State change: Local UP
Aug 10 17:21:31 rgmanager State change: rhel1.local UP
Aug 10 17:23:23 rgmanager Starting stopped service service:teszt
Aug 10 17:23:25 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:23:29 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:23:29 rgmanager Stopping service service:teszt
Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))
Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop;
intervention required
Aug 10 17:23:31 rgmanager Service service:teszt is failed
Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not
start.
Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop cleanly
Aug 10 17:25:12 rgmanager Starting stopped service service:teszt
Aug 10 17:25:14 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:25:17 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:25:18 rgmanager Stopping service service:teszt
Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))
After I manually started the lvm on node1 and tried to switch it on
node2 it's not able to start it.
Regards,
Krisztian
On 08/10/2012 05:15 PM, Digimer wrote:
> On 08/10/2012 11:07 AM, Poós Krisztián wrote:
>> Dear all,
>>
>> I hope that anyone run into this problem in the past, so maybe can help
>> me resolving this issue.
>>
>> There is a 2 node rhel cluster with quorum also.
>> There are clustered lvms, where the -c- flag is on.
>> If I start clvmd all the clustered lvms became online.
>>
>> After this if I start rgmanager, it deactivates all the volumes, and not
>> able to activate them anymore as there are no such devices anymore
>> during the startup of the service, so after this, the service fails.
>> All lvs remain without the active flag.
>>
>> I can manually bring it up, but only if after clvmd is started, I set
>> the lvms manually offline by the lvchange -an <lv>
>> After this, when I start rgmanager, it can take it online without
>> problems. However I think, this action should be done by the rgmanager
>> itself. All the logs is full with the next:
>> rgmanager Making resilient: lvchange -an ....
>> rgmanager lv_exec_resilient failed
>> rgmanager lv_activate_resilient stop failed on ....
>>
>> As well, sometimes the lvs/clvmd commands are also hanging. I have to
>> restart clvmd to make it work again. (sometimes killing it)
>>
>> Anyone has any idea, what to check?
>>
>> Thanks and regards,
>> Krisztian
>
> Please paste your cluster.conf file with minimal edits.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4925 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120810/9b69cc18/attachment.p7s>
More information about the Linux-cluster
mailing list