[Linux-cluster] Disabling cman at boot

Kadlecsik Jozsef kadlec at sunserv.kfki.hu
Thu Jun 19 09:34:57 UTC 2008


On Thu, 19 Jun 2008, Fabio M. Di Nitto wrote:

> On Wed, 18 Jun 2008, Federico Simoncelli wrote:
> 
> > On Wed, Jun 18, 2008 at 10:23 AM, Kadlecsik Jozsef
> > <kadlec at sunserv.kfki.hu> wrote:
> > > On Wed, 18 Jun 2008, Federico Simoncelli wrote:
> > > > Do you think adding a boot parameter (eg: nocluster) could be a good
> > > > solution? We should modify the init file for cman to check the
> > > > presence of that parameter and skip the start process if present.
> > >
> > > We use exactly the same method to specify how to boot a machine:
> >
> > Do you think we can find a common solution and propose a patch upstream?
> 
> I am open to add patches.
>
> > Having a boot parameter (eg: nocluster) to skip cman at boot would be
> > useful in your case or you really need something more specific to GFS?
> > Skipping cman would prevent any other cluster service to start (GFS too).
> 
> GFS can also run in lock_nolock and that does not require cman.

But that is not an option in a in-production GFS cluster, I believe.
 
> I think you want to make sure of what you really want before
> enabling/disabling cman.

What we implemented serves multiple purposes:

a. disable the whole GFS cluster suite (and every service relying on GFS)
   for the next reboot of a host as a planned maintenance mode
b. disable the mounting of GFS volumes (and every services relying on GFS
   volumes) for the next reboot of a host as a planned maintenance mode
c. boot a host directly in a) mode in case of emergency
d. boot a host directly in b) mode in case of emergency

To underline the difference and importancy of a-c and b-d modes, just a 
few examples: 

- due to the hardware upgrade of the shared block devices (in our case 
  Coraid AoE), we applied a) to shut down the whole cluster, upgrade the 
  hardware, test AoE access and update manually /etc/cluster/cluster.conf 
  with the new fencing parameters
- once we were hit by a firmware bug in the shared devices and by 
  switching to b-d) we could run gfs_fsck on all volumes easily
- all modes were used at testing services, fixing boot/shutdown/reboot 
  init scripts on test nodes (we run GFS on top of Ubuntu).

As we rewrote the cman and gfs init scripts due to the differences between 
RedHat and Debian/Ubuntu, I can't really send patches. But the changes in 
question are actually minimal: both the cman and gfs init script starts 
with the added lines

#
# Skip GFS if asked at boot time
#
[ "$1" = "start" -a -e /etc/cluster/skip_gfs ] && exit 0

and there is just one more line added to the gfs init script, before 
mounting the volumes

                [ -e /etc/cluster/skip_gfs_mount ] && exit 0
                echo -n "Mounting GFS filesystems: "
                mount -a -t gfs

The files /etc/cluster/skip_gfs and /etc/cluster/skip_gfs_mount are 
created by the attached init script called 'cluster_maintenance' if 
it's called manually by the proper arguments or at boot time when it 
detects the corresponding boot arguments (started before cman by init). 
The script can run additional scripts to disable/enable services which 
rely on GFS.

I hope it helps.

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary
-------------- next part --------------
#!/bin/sh

case "$1" in
    start)
        if [ -n "`grep SKIP_GFS /proc/cmdline`" ]; then
        	touch /etc/cluster/skip_gfs
        fi
        if [ -n "`grep SKIP_GFS_MOUNT /proc/cmdline`" ]; then
        	touch /etc/cluster/skip_gfs_mount
        fi
    	if [ -e /etc/cluster/skip_gfs -o -e /etc/cluster/skip_gfs_mount ]; then
    		shopt -s nullglob
		for x in /etc/cluster/disable_services/*; do
	    		$x disable
		done
		# ssh
		cat > /etc/nologin <<TXT
System is under maintenance, sorry.
TXT
	fi
	;;
    skip_gfs)
    	touch /etc/cluster/skip_gfs
    	;;
    skip_gfs_mount)
    	touch /etc/cluster/skip_gfs_mount
    	;;
    disable)
        rm -f /etc/cluster/skip_gfs /etc/cluster/skip_gfs_mount
        if [ -z "`pgrep fenced`" ]; then
        	/etc/init.d/cman start
        fi
        if [ -z "`pgrep clvmd`" ]; then
        	/etc/init.d/gfs start
        fi
    	shopt -s nullglob
	for x in /etc/cluster/disable_services/*; do
	    $x enable
	    y=${x##*/}
	    /etc/init.d/$y start
	done
	# ssh
	rm -f /etc/nologin
	;;
    stop_services)
	shopt -s nullglob
	for x in /etc/cluster/disable_services/*; do
		y=${x##*/}
		/etc/init.d/$y stop
	done
	;;
    stop)
	;;
   *)
   	echo "Usage: $0 {start|stop_services|disable|gfs|gfs_mount}"
   	;;
esac


More information about the Linux-cluster mailing list