[Linux-cluster] shutdown (OS) of all GFS nodes

Brett Cave brettcave at gmail.com
Mon Feb 15 07:10:28 UTC 2010


K74gfs = shutdown GFS service = unmount gfs volumes.

Specifically, wouldn't that openais service cause issues with cman / gfs,
and having openais shut down will result in gfs volumes not being able to
unmount?

i ran some service shutdowns manually over the weekend, and it seems that
the following process works:
1) stop all services that have open files on gfs volumes or use clvm volumes
(e.g. cutstom app referencing files, xendomains using cluster LV's)
2) service gfs stop (unmount gfs volumes)
3) service clvmd stop
4) service cman stop
5) init 6

this works without hanging. Definitely an indication that there maybe
incorrect ordering of shutdown. anyone else see this happening?



On Sat, Feb 13, 2010 at 8:17 PM, Ian Hayes <cthulhucalling at gmail.com> wrote:

> I see things like this often if the gfs volume isn't unmounted before
> attempting to shut down the cluster daemons.
>
> On Feb 13, 2010 12:52 AM, "Brett Cave" <brettcave at gmail.com> wrote:
>
> hi,
>
> I have a GFS cluster (4 node + qdisk on SAN), and have problems shutting
> down cman service / unmount gfs mountpoints - it causes the shutdown to
> hang.  I am running GFS & CLVM (lv's are xen guest drives). If i try and
> shut down cman service manually, i get an error that resources are still in
> use.   1 gfs directory is exported via NFS.
>
> I think it may be because of service stop order, specifically openais
> stopping before cman - could this be a valid reason?
>
> Init6 levels are:
> K00xendomains
> K01xend
> K03libvirtd
> K20nfs
> K20openais
> K74gfs
> K74gfs2
> K76clvmd
> K78qdiskd
> K79cman
> K86nfslock
>
>
> If I manually run through the stopping of these services, gfs service
> hangs. This is the log:
> Feb 13 10:35:25 vmhost-01 gfs_controld[3227]: cluster is down, exiting
> Feb 13 10:35:25 vmhost-01 dlm_controld[3221]: cluster is down, exiting
> Feb 13 10:35:25 vmhost-01 fenced[3215]: cluster is down, exiting
> Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 4
> Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 3
> Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 2
> Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 1
> Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> cman_dispatch: Host is down
> Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> Halting qdisk operations
> Feb 13 10:35:51 vmhost-01 ccsd[3165]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Feb 13 10:36:13 vmhost-01 mountd[3927]: Caught signal 15, un-registering
> and exiting.
> Feb 13 10:36:13 vmhost-01 kernel: nfsd: last server has exited
> Feb 13 10:36:13 vmhost-01 kernel: nfsd: unexporting all filesystems
> Feb 13 10:36:21 vmhost-01 ccsd[3165]: Unable to connect to cluster
> infrastructure after 60 seconds.
>
> ccsd continues to repeat the last message, increasing time: 60s, 90s, 120s,
> 180s, 210s, etc
>
> dmesg shows:
> dlm: closing connection to node 4
> dlm: closing connection to node 3
> dlm: closing connection to node 2
> dlm: closing connection to node 1
>
> There are no open  files on GFS (from lsof)
>
> I am using gfs (1).
>
> The only workaround I have now is to reset the nodes via ILO once the
> shutdown process starts (and hangs on either gfs or cman service stop).
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100215/d25be0d8/attachment.htm>


More information about the Linux-cluster mailing list