[Linux-cluster] Info on clvmd with halvm on rhel 6.3 based clusters

Gianluca Cecchi gianluca.cecchi at gmail.com
Fri Jul 5 15:35:18 UTC 2013


On Fri, Jul 5, 2013 at 2:42 AM, Ryan Mitchell  wrote:

> You aren't starting rgmanager with the -N option are you?  It is not the
> default.
> # man clurgmgrd
>        -N     Do  not  perform  stop-before-start.  Combined with the -Z
> flag to clusvcadm, this can be used to allow rgmanager to be upgraded
>               without stopping a given user service or set of services.
>
> What is supposed to happen is:
> - clvmd is started at boot time, and all clustered logical volumes are
> activated (including CLVM HA-LVM volumes)
> - rgmanager starts after clvmd, and it initializes all resources to ensure
> they are in a known state.  For example:
> Jul  4 20:06:26 r6ha1 rgmanager[2478]: I am node #1
> Jul  4 20:06:27 r6ha1 rgmanager[2478]: Resource Group Manager Starting
> Jul  4 20:06:27 r6ha1 rgmanager[2478]: Loading Service Data
> Jul  4 20:06:33 r6ha1 rgmanager[2478]: Initializing Services
> <----
> Jul  4 20:06:33 r6ha1 rgmanager[3316]: [fs] stop: Could not match
> /dev/vgdata/lvmirror with a real device
> Jul  4 20:06:33 r6ha1 rgmanager[2478]: stop on fs "fsdata" returned 2
> (invalid argument(s))
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: Services Initialized
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: Local UP
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: r6ha2.cluster.net UP
> - So when rgmanager starts, it stops the CLVM HA-LVM logical volumes again
> prior to starting the service, unless you disabled the "stop-before-start"
> option.
>
> I did a quick test and I got the same results as you.  Can you show your
> resource/service definitions and the logs of when rgmanager starts up?
>

>
> If you open a case with Red Hat, it may find its way to me and we can
> troubleshoot further.


Thanks for the answer Ryan.
I opened the case 00900301 as suggested.
I think the problem is with the clvmd already activating lvs.

My service is composed by ip resource and some <lv..> and <fs...> resources
When the nodes start up, on the node chosen by priority definition of
failover domain I get this:

Jul  4 14:27:46 oraugov4 rgmanager[6469]: Services Initialized
Jul  4 14:27:46 oraugov4 rgmanager[6469]: State change: Local UP
Jul  4 14:27:46 oraugov4 rgmanager[6469]: Starting stopped service
service:MYSERVICE
Jul  4 14:27:48 oraugov4 rgmanager[9436]: [lvm] Failed to activate
logical volume, VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:48 oraugov4 rgmanager[9458]: [lvm] Attempting cleanup of
VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:49 oraugov4 rgmanager[9484]: [lvm] Failed second attempt
to activate VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:49 oraugov4 rgmanager[6469]: start on lvm
"LV_UGDMPRO_TEMP" returned 1 (generic error)
Jul  4 14:27:49 oraugov4 rgmanager[6469]: #68: Failed to start
service:MYSERVICE; return value: 1
Jul  4 14:27:49 oraugov4 rgmanager[6469]: Stopping service service:MYSERVICE
Jul  4 14:27:49 oraugov4 rgmanager[9557]: [fs] stop: Could not match
/dev/VG_PROVA/lv_prova with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "PROVA" returned
2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9594]: [fs] stop: Could not match
/dev/VG_UGDMPRE_RDOF/LV_UGDMPRE_RDOF with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_RDOF"
returned 2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9631]: [fs] stop: Could not match
/dev/VG_UGDMPRE_REDO/LV_UGDMPRE_REDO with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_REDO"
returned 2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9669]: [fs] stop: Could not match
/dev/VG_UGDMPRE_DATA/LV_UGDMPRE_DATA with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_DATA"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9706]: [fs] stop: Could not match
/dev/VG_UGDMPRE_SAVE/LV_UGDMPRE_SAVE with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_SAVE"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9743]: [fs] stop: Could not match
/dev/VG_UGDMPRE_CTRL/LV_UGDMPRE_CTRL with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_CTRL"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9780]: [fs] stop: Could not match
/dev/VG_UGDMPRE_TEMP/LV_UGDMPRE_TEMP with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_TEMP"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9817]: [fs] stop: Could not match
/dev/VG_UGDMPRO_RDOF/LV_UGDMPRO_RDOF with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_RDOF"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9854]: [fs] stop: Could not match
/dev/VG_UGDMPRO_REDO/LV_UGDMPRO_REDO with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_REDO"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9891]: [fs] stop: Could not match
/dev/VG_UGDMPRO_DATA/LV_UGDMPRO_DATA with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_DATA"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9928]: [fs] stop: Could not match
/dev/VG_UGDMPRO_SAVE/LV_UGDMPRO_SAVE with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_SAVE"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9965]: [fs] stop: Could not match
/dev/VG_UGDMPRO_CTRL/LV_UGDMPRO_CTRL with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_CTRL"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[10002]: [fs] stop: Could not match
/dev/VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_TEMP"
returned 2 (invalid argument(s))
Jul  4 14:27:53 oraugov4 rgmanager[6469]: State change: icloraugov3 UP
Jul  4 14:28:11 oraugov4 rgmanager[6469]: #12: RG service:MYSERVICE
failed to stop; intervention required


So I think I have double problem:

1) lv fails to activate because already active
2) then to solve the problem it tries to stop resources but fs.sh
fails because it seems there is no related lv under it
I think during the stop it should reverse order, so it should stop fs
first (and it should get a result of already stopped) and only after
it should deactivate the related lv... or not?

Gianluca




More information about the Linux-cluster mailing list