[Linux-cluster] umount failed - device is busy
Herta Van den Eynde
herta.vandeneynde at cc.kuleuven.be
Thu May 4 23:25:59 UTC 2006
Herta Van den Eynde wrote:
> Herta Van den Eynde wrote:
>> Lon Hohberger wrote:
>>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>>> Bit of extra information: the system that was running the services
>>>> got STONITHed by the other cluster member shortly before midnight.
>>>> The services all failed over nicely, but the situation remains: if
>>>> I try to stop or relocate a service, I get a "device is busy".
>>>> I suppose that rules out an intermittent issue.
>>>> There's no mounts below mounts.
>>> Nfsd is the most likely candidate for holding the reference.
>>> Unfortunately, this is not something I can track down; you will have to
>>> either file a support request and/or a Bugzilla. When you get a chance,
>>> you should definitely try stopping nfsd and seeing if that clears the
>>> mystery references (allowing you to unmount). If the problem comes from
>>> nfsd, it should not be terribly difficult to track down.
>>> Also, you should not need to recompile your kernel to probe all the LUNs
>>> per device; just edit /etc/modules.conf:
>>> options scsi_mod max_scsi_luns=128
>>> ... then run mkinitrd to rebuild the initrd image.
>>> -- Lon
>> Next maintenance window is 4 weeks away, so I won't be able to test
>> the nfsd hypothesis anytime soon. In the meantime, I'll file a
>> support request. I'll keep you posted.
>> At least the unexpected STONITH confirms that the failover still works.
>> The /etc/modules.conf tip is a big time saver. Rebuilding the modules
>> takes forever.
>> Thanks, Lon.
> Apologies for not updating this sooner. (Thanks for remindeing me, Owen.)
> During a later maintenance window, I shut down the cluster services, but
> it wasn't until I stopped the nfsd, that the filesystems could actually
> be unmounted, which seems to confirm Lon's theory about nfsd being the
> likely candidate for holding the reference.
> I found a note elsewhere on the web where someone worked around the
> problem by stopping nfsd, stopping the service, restarting nfsd, and
> relocating the service. Disadvantage being that all nfs services
> experience a minor interrupt at the time.
> Anyway, my problem disappeared during the latest maintenance window.
> Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL ->
> nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so
> I'm not 100% sure which of the two fixed it, and curious though I am, I
> simply don't have the time to start reading the code. If anyone has
> further insights, I'd love to read about it, though.
> Kind regards,
Someone reported off line that they are experiencing the same problem
while running the same versions we currently are.
So just for completeness sake: expecting problems, I also upped the
clumanager log levels during the last maintenance window. They are now at:
Come to think of it, I probably loosened the log levels during the
maintenance window when our problems began (I wanted to reduce the size
of the logs). Not sure how - or even if - this might affect things, though.
More information about the Linux-cluster