[Linux-cluster] umount failed - device is busy
Herta Van den Eynde
herta.vandeneynde at cc.kuleuven.be
Thu May 4 07:25:57 UTC 2006
Herta Van den Eynde wrote:
> Lon Hohberger wrote:
>
>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>
>>
>>> Bit of extra information: the system that was running the services
>>> got STONITHed by the other cluster member shortly before midnight.
>>> The services all failed over nicely, but the situation remains: if I
>>> try to stop or relocate a service, I get a "device is busy".
>>> I suppose that rules out an intermittent issue.
>>>
>>> There's no mounts below mounts.
>>
>>
>>
>> Drat.
>>
>> Nfsd is the most likely candidate for holding the reference.
>> Unfortunately, this is not something I can track down; you will have to
>> either file a support request and/or a Bugzilla. When you get a chance,
>> you should definitely try stopping nfsd and seeing if that clears the
>> mystery references (allowing you to unmount). If the problem comes from
>> nfsd, it should not be terribly difficult to track down.
>>
>> Also, you should not need to recompile your kernel to probe all the LUNs
>> per device; just edit /etc/modules.conf:
>>
>> options scsi_mod max_scsi_luns=128
>>
>> ... then run mkinitrd to rebuild the initrd image.
>>
>> -- Lon
>
> Next maintenance window is 4 weeks away, so I won't be able to test the
> nfsd hypothesis anytime soon. In the meantime, I'll file a support
> request. I'll keep you posted.
>
> At least the unexpected STONITH confirms that the failover still works.
>
> The /etc/modules.conf tip is a big time saver. Rebuilding the modules
> takes forever.
>
> Thanks, Lon.
>
> Herta
Apologies for not updating this sooner. (Thanks for remindeing me, Owen.)
During a later maintenance window, I shut down the cluster services, but
it wasn't until I stopped the nfsd, that the filesystems could actually
be unmounted, which seems to confirm Lon's theory about nfsd being the
likely candidate for holding the reference.
I found a note elsewhere on the web where someone worked around the
problem by stopping nfsd, stopping the service, restarting nfsd, and
relocating the service. Disadvantage being that all nfs services
experience a minor interrupt at the time.
Anyway, my problem disappeared during the latest maintenance window.
Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL ->
nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so
I'm not 100% sure which of the two fixed it, and curious though I am, I
simply don't have the time to start reading the code. If anyone has
further insights, I'd love to read about it, though.
Kind regards,
Herta
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
More information about the Linux-cluster
mailing list