[Linux-cluster] mount is hanging
micah nerren
mnerren at paracel.com
Tue Oct 5 18:11:46 UTC 2004
Hiya,
On Fri, 2004-10-01 at 08:24, Adam Manthei wrote:
> On Thu, Sep 30, 2004 at 04:01:44PM -0700, micah nerren wrote:
> > Hi,
> >
> > I have a SAN with 4 file systems on it, each GFS. These are mounted
> > across various servers running GFS, 3 of which are lock_gulm servers.
> > This is on RHEL WS 3 with GFS-6.0.0-7.1 on x86_64.
>
> How many nodes?
All total 4 servers mounting the 4 file systems. 3 of them are lock_gulm
servers.
> > One of the file systems simply will not mount now. The other 3 mount and
> > unmount fine. They are all part of the same cca. I have my master lock
> > server running in heavy debug mode but none of the output from
> > lock_gulmd tells me anything about this one bad pool. How can I figure
> > out what is going on, any good debug or troubleshooting steps I should
> > do? I think if I just reboot everything it will settle down, but we
> > can't do that just yet, as the master lock server happens to be on a
> > production box right now.
>
> 1) Are you certain that you have uniquely names all four filesystems? You can
> use gfs_tool to verify that there are no duplicate names.
Yes, there are no duplicate names. They all have unique names.
> 2) Is there an expired node that is not fenced holding a lock on that
> filesystem? gulm_tool will help there.
No expired node. gulm_tool tells me everything is perfectly fine, hence
the ability of all the nodes to mount the other 3 file systems. I have
tried manually fencing and unfencing two of the systems, to no avail.
> 3) Did you ever have all 4 filesystems mounted at the same time on the same
> node? i.e. did it "all of a sudden" stop working or was it always
> failing?
Yes, its been running fine for several weeks. It "suddenly" freaked out.
It is possible the customer did something I am unaware of, but I don't
know what they could have done to cause this.
> > Also, is there a way to migrate a master lock server to a slave lock
> > server? In other words, can I force the master to become a slave and a
> > slave to become the new master?
>
> Restarting lock_gulmd on the master will cause one of the slaves to pick up
> as master and the master to come back up as a slave. Note that this only
> works when you have a dedicated gulm server. If you have an embedded master
> server (a gulm server also mounting GFS) bad things will happen when the
> server restarts.
Ugh, thats what I really need to avoid. I do not have dedicated gulm
servers, the master is on a machine that is also mounting the file
systems and is in heavy production use.
I am quite certain from past experiences that just rebooting all 4
servers will fix this up, but I can't do that.
What I am going to try right now is blowing away the one pool that is
acting up, rebuilding it and seeing if that works. Luckily this one pool
is non-critical and is backed up, so I can just nuke it.
Thanks,
Micah
More information about the Linux-cluster
mailing list