[Linux-cluster] umount failed - device is busy

Thu May 4 07:25:57 UTC 2006

Herta Van den Eynde wrote:
> Lon Hohberger wrote:
> 
>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>
>>
>>> Bit of extra information:  the system that was running the services 
>>> got STONITHed by the other cluster member shortly before midnight.
>>> The services all failed over nicely, but the situation remains:  if I 
>>> try to stop or relocate a service, I get a "device is busy".
>>> I suppose that rules out an intermittent issue.
>>>
>>> There's no mounts below mounts.
>>
>>
>>
>> Drat.
>>
>> Nfsd is the most likely candidate for holding the reference.
>> Unfortunately, this is not something I can track down; you will have to
>> either file a support request and/or a Bugzilla.  When you get a chance,
>> you should definitely try stopping nfsd and seeing if that clears the
>> mystery references (allowing you to unmount).  If the problem comes from
>> nfsd, it should not be terribly difficult to track down.
>>
>> Also, you should not need to recompile your kernel to probe all the LUNs
>> per device; just edit /etc/modules.conf:
>>
>> options scsi_mod max_scsi_luns=128
>>
>> ... then run mkinitrd to rebuild the initrd image.
>>
>> -- Lon
> 
> Next maintenance window is 4 weeks away, so I won't be able to test the 
> nfsd hypothesis anytime soon.  In the meantime, I'll file a support 
> request.  I'll keep you posted.
> 
> At least the unexpected STONITH confirms that the failover still works.
> 
> The /etc/modules.conf tip is a big time saver.  Rebuilding the modules 
> takes forever.
> 
> Thanks, Lon.
> 
> Herta

Apologies for not updating this sooner.  (Thanks for remindeing me, Owen.)

During a later maintenance window, I shut down the cluster services, but 
it wasn't until I stopped the nfsd, that the filesystems could actually 
be unmounted, which seems to confirm Lon's theory about nfsd being the 
likely candidate for holding the reference.

I found a note elsewhere on the web where someone worked around the 
problem by stopping nfsd, stopping the service, restarting nfsd, and 
relocating the service.  Disadvantage being that all nfs services 
experience a minor interrupt at the time.

Anyway, my problem disappeared during the latest maintenance window. 
Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> 
nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so 
I'm not 100% sure which of the two fixed it, and curious though I am, I 
simply don't have the time to start reading the code.  If anyone has 
further insights, I'd love to read about it, though.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm