[Linux-cluster] Disabling cman at boot

Fabio M. Di Nitto fdinitto at redhat.com
Thu Jun 19 12:24:27 UTC 2008

On Thu, 19 Jun 2008, Kadlecsik Jozsef wrote:

> On Thu, 19 Jun 2008, Fabio M. Di Nitto wrote:
>> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>>> On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
>>> <kadlec at sunserv.kfki.hu> wrote:
>>>> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
>>>>> Do you think adding a boot parameter (eg: nocluster) could be a good
>>>>> solution? We should modify the init file for cman to check the
>>>>> presence of that parameter and skip the start process if present.
>>>> We use exactly the same method to specify how to boot a machine:
>>> Do you think we can find a common solution and propose a patch upstream?
>> I am open to add patches.
>>> Having a boot parameter (eg: nocluster) to skip cman at boot would be
>>> useful in your case or you really need something more specific to GFS?
>>> Skipping cman would prevent any other cluster service to start (GFS too).
>> GFS can also run in lock_nolock and that does not require cman.
> But that is not an option in a in-production GFS cluster, I believe.

There is really nothing that stops you to have / on GFS lock_nolock.

>> I think you want to make sure of what you really want before
>> enabling/disabling cman.
> What we implemented serves multiple purposes:
> a. disable the whole GFS cluster suite (and every service relying on GFS)
>   for the next reboot of a host as a planned maintenance mode
> b. disable the mounting of GFS volumes (and every services relying on GFS
>   volumes) for the next reboot of a host as a planned maintenance mode
> c. boot a host directly in a) mode in case of emergency
> d. boot a host directly in b) mode in case of emergency
> To underline the difference and importancy of a-c and b-d modes, just a
> few examples:
> - due to the hardware upgrade of the shared block devices (in our case
>  Coraid AoE), we applied a) to shut down the whole cluster, upgrade the
>  hardware, test AoE access and update manually /etc/cluster/cluster.conf
>  with the new fencing parameters
> - once we were hit by a firmware bug in the shared devices and by
>  switching to b-d) we could run gfs_fsck on all volumes easily
> - all modes were used at testing services, fixing boot/shutdown/reboot
>  init scripts on test nodes (we run GFS on top of Ubuntu).

I understand the use cases. That's not the problem. I'd like to see a 
"standard" set of keywords to use that we all agree upon.

> As we rewrote the cman and gfs init scripts due to the differences between
> RedHat and Debian/Ubuntu, I can't really send patches.

I maintain the Ubuntu init script and work very closely with the Debian 
maintainers. At the same time i can apply changes to whatver is shipped 
from upstream.
So this is not a blocker whatsoever.

Whatever changes you have, best to have them sent to this mailing list for 
evaluation and mostlikely inclusion if they are valid.

> But the changes in
> question are actually minimal: both the cman and gfs init script starts
> with the added lines
> #
> # Skip GFS if asked at boot time
> #
> [ "$1" = "start" -a -e /etc/cluster/skip_gfs ] && exit 0
> and there is just one more line added to the gfs init script, before
> mounting the volumes
>                [ -e /etc/cluster/skip_gfs_mount ] && exit 0
>                echo -n "Mounting GFS filesystems: "
>                mount -a -t gfs
> The files /etc/cluster/skip_gfs and /etc/cluster/skip_gfs_mount are
> created by the attached init script called 'cluster_maintenance' if
> it's called manually by the proper arguments or at boot time when it
> detects the corresponding boot arguments (started before cman by init).
> The script can run additional scripts to disable/enable services which
> rely on GFS.

I see. I think i would prefer to see the parsing of cmdline done directly 
by cman/gfs/rgmanager init scripts rather than an extra one.

Few reasons:

- more init scripts increase complexity of boot sequence
- you touch files in /etc and that's not good practise. / might be read 
only if we are in emergency maintainance and even if it is read-write, 
something that is not really a config option, should go in either 
/var/lib/something (if needs to be persistent at reboot) or /tmp.

> I hope it helps.



I'm going to make him an offer he can't refuse.

More information about the Linux-cluster mailing list