[Linux-cluster] GFS volume already mounted or /mnt busy?

Fri Dec 22 20:21:55 UTC 2006

bigendian+gfs at gmail.com wrote:
> Hello Robert,
>
> The other node was previously rebuilt for another temporary purpose 
> and isn't attached to the SAN.  The only thing I can think of that 
> might have been out of the ordinary is that I may have pulled the 
> power on the machine while it was shutting down during some file 
> system operation.  The disk array itself never lost power.
>
> I do have another two machines configured in a different cluster 
> attached to the SAN.  CLVM on machines in the other cluster does show 
> the volume that I am having trouble with though those machines do not 
> mount the device.  Could this have caused the trouble? 
>
> More importantly, is there a way to repair the volume?  I can see the 
> device with fdisk -l and gfs_fsck completes with errors, but mount 
> attempts always fail with the "mount: /dev/etherd/e1.1 already mounted 
> or /gfs busy" error.  I don't know how to debug this at a lower level 
> to understand why this error is happening.  Any pointers?
Hi Tom,

Well, if gfs_fsck aborted prematurely, it may have left your lock 
protocol in an
unusable state.  Ordinarily, gfs_fsck temporarily changes the locking 
protocol
to "fsck_xxxx" to prevent someone from mounting the file system while it's
busy doing the file system check.  When it's done, it sets it back to 
"lock_xxxx".
However, older versions of gfs_fsck weren't setting it back to "lock_xxxx"
when they bailed out due to errors.  That's since been corrected in the 
latest
version of gfs_fsck, which I think is in U4.  Try this:

gfs_tool sb /dev/etherd/e1.1 proto

If it says "fsck_dlm" or something with fsck, then it's wrong.  To fix 
it, do:

gfs_tool sb /dev/etherd/e1.1 proto lock_dlm

(If you're using DLM locking, or lock_gulm if you're using Gulm locking).

If it still doesn't let you mount, look in dmesg for error messages 
relating to
why it can't mount.  If the logical volume is really in use, you can 
eliminate
the other systems by doing "vgchange -an etherd" on the other machines,
and try again.

I'm still confused, though: Are you running the latest gfs_fsck and was it
or wasn't it able to repair the damaged RGs?  Did it error out or did it
go through all the passes 1 through 5?

Regards,

Bob Peterson
Red Hat Cluster Suite