[Linux-cluster] Possible bug in rhel5 for nested HA-LVM resources?

Gianluca Cecchi gianluca.cecchi at gmail.com
Fri Mar 5 10:41:36 UTC 2010


On  Thu, 04 Mar 2010 09:33:29 -0500 Lon Hohberger wrote:
> I think it will remount TEST_TEMP.
>
> You can test:
[snip]
> -- Lon

Thanks, Lon.
Your answers are always quick, precise, and complete!
Just to let you know the results on my test cluster.
The delta gave:
[root at clutest2 cluster]# grep NEED /tmp/delta.out
    lvm [ NEEDSTOP ] {
  lvm [ NEEDSTART ] {

and end of delta file:
=== Operations (down-phase) ===
Node lvm:TEST_TEMP - CONDSTOP
[stop] lvm:TEST_TEMP
=== Operations (up-phase) ===
Node lvm:TEST_TEMP - CONDSTART
[start] lvm:TEST_TEMP

probably the stop of temp fs is not considered because of the "buggy"
cluster.old file...
So it does't consider the fs to be umounted, but it considers the lv to be
deactivated....  ;-)
Doing the reconfiguration succeeds for what is my goal in the sense that the
service is not disrupted as a whole and the other resource components keep
running.
Also, the fs remains actually mounted (though I would have accepted an
umount/remount of it as I moved the files in the mean time..)
In fact I don't find the message about the mount in messages; furthermore
"tune2fs -l" against the lv gives no changed "Last mount time:" value, so
Indeed the fs was not umounted.....
So far so good.
As a latest comment, is it expected by design that the failure of lv
deactivation during reconfiguration (because of the mounted fs on top of it)
doesn't imply stop of service and that the reconfiguration phase continues
silently?

See what written in messages during the reconfiguration:

Mar  5 10:13:02 clutest2 clurgmgrd: [2332]: <notice> Getting status
Mar  5 10:13:02 clutest2 last message repeated 2 times
Mar  5 10:23:13 clutest2 ccsd[1956]: Update of cluster.conf complete
(version 12 -> 13).
Mar  5 10:23:27 clutest2 clurgmgrd[2332]: <notice> Reconfiguring
Mar  5 10:23:28 clutest2 clurgmgrd: [2332]: <notice> Deactivating
VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:28 clutest2 clurgmgrd: [2332]: <notice> Making resilient :
lvchange -an VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:28 clutest2 clurgmgrd: [2332]: <notice> Resilient command:
lvchange -an VG_TEST_TEMP/LV_TEST_TEMP --config
devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]}
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <err> lv_exec_resilient failed
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <err> lv_activate_resilient stop
failed on VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <err> Unable to deactivate
VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <err> Failed to stop
VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <err> Failed to stop
VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd[2332]: <notice> stop on lvm "TEST_TEMP"
returned 1 (generic error)
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <notice> Activating
VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <notice> Making resilient :
lvchange -ay VG_TEST_TEMP/LV_TEST_TEMP
Mar  5 10:23:29 clutest2 clurgmgrd: [2332]: <notice> Resilient command:
lvchange -ay VG_TEST_TEMP/LV_TEST_TEMP --config
devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]}
Mar  5 10:23:32 clutest2 clurgmgrd: [2332]: <notice> Getting status

Eventually I can send my cluster.conf.old and cluster.conf.new is needed
 for further debug.

Cheers and thanks again for support,
Gianluca



On Wed, Mar 3, 2010 at 11:42 PM, Gianluca Cecchi
<gianluca.cecchi at gmail.com>wrote:

> On Wed, 03 Mar 2010 16:53:49 -0500, Lon Hohberger wrote:
> > As it happens, the 'fs' file system type looks for child 'fs' resources:
> >
> >         <child type="fs" start="1" stop="3"/>
> >
> > ... but it does not have an entry for 'lvm', which would be required to
> > make it work in the order you specified.
>
> With this argument I understand expected behaviour now, even if not so
> intuitive imho
> Probably it's only me, but I did read the page referred in [1] and I didn't
> deduce what you write above...
>
> I intended child as a child from an xml tag inclusion point of view, not in
> the sense of pre-defined hierarchy between type-specified resources...
> In fact there are terms such as "type-specified children" and "untyped
> children"....
> In my case lvm is a type-specified child as I understood, so that I thought
> it should have started after the same level fs resource... not before it
>
> Resuming, you say
> As it happens, the 'fs' file system type looks for child 'fs' resources
> I thought
> As it happens, the 'fs' file system type looks for child resources and
> starts them based on defined child ordering (so lvm:TEST_TEMP before
> fs:TEST_TEMP)
> Thanks for explanation
>
> Now,
> if passing from a running
>                 <service domain="MAIN" autostart="1" name="TESTSRV">
>                         <ip ref="10.4.5.157"/>
>                         <lvm ref="TEST_APPL"/>
>                         <fs ref="TEST_APPL"/>
>                         <lvm ref="TEST_DATA"/>
>                         <fs ref="TEST_DATA">
>                                 <lvm ref="TEST_TEMP"/>
>                                  <fs ref="TEST_TEMP"/>
>                         </fs>
>                         <script ref="clusterssh"/>
>                 </service>
>
> To your suggested
>
>
>   <service domain="MAIN" autostart="1" name="TESTSRV">
>     <ip ref="10.4.5.157"/>
>     <lvm ref="TEST_APPL"/>
>     <fs ref="TEST_APPL"/>
>     <lvm ref="TEST_DATA"/>
>     <lvm ref="TEST_TEMP"/>
>     <fs ref="TEST_DATA">
>
>       <fs ref="TEST_TEMP"/>
>     </fs>
>     <script ref="clusterssh"/>
>   </service>
>
> will or will not disrupt service (such as umount/mount of the FS_TEMP
> filesystem)?
>
> Gianluca
>
> **
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100305/1e184616/attachment.htm>


More information about the Linux-cluster mailing list