[Linux-cluster] umount failed - device is busy

Thu May 4 23:25:59 UTC 2006

Herta Van den Eynde wrote:
> Herta Van den Eynde wrote:
> 
>> Lon Hohberger wrote:
>>
>>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>>
>>>
>>>> Bit of extra information:  the system that was running the services 
>>>> got STONITHed by the other cluster member shortly before midnight.
>>>> The services all failed over nicely, but the situation remains:  if 
>>>> I try to stop or relocate a service, I get a "device is busy".
>>>> I suppose that rules out an intermittent issue.
>>>>
>>>> There's no mounts below mounts.
>>>
>>>
>>>
>>>
>>> Drat.
>>>
>>> Nfsd is the most likely candidate for holding the reference.
>>> Unfortunately, this is not something I can track down; you will have to
>>> either file a support request and/or a Bugzilla.  When you get a chance,
>>> you should definitely try stopping nfsd and seeing if that clears the
>>> mystery references (allowing you to unmount).  If the problem comes from
>>> nfsd, it should not be terribly difficult to track down.
>>>
>>> Also, you should not need to recompile your kernel to probe all the LUNs
>>> per device; just edit /etc/modules.conf:
>>>
>>> options scsi_mod max_scsi_luns=128
>>>
>>> ... then run mkinitrd to rebuild the initrd image.
>>>
>>> -- Lon
>>
>>
>> Next maintenance window is 4 weeks away, so I won't be able to test 
>> the nfsd hypothesis anytime soon.  In the meantime, I'll file a 
>> support request.  I'll keep you posted.
>>
>> At least the unexpected STONITH confirms that the failover still works.
>>
>> The /etc/modules.conf tip is a big time saver.  Rebuilding the modules 
>> takes forever.
>>
>> Thanks, Lon.
>>
>> Herta
> 
> 
> Apologies for not updating this sooner.  (Thanks for remindeing me, Owen.)
> 
> During a later maintenance window, I shut down the cluster services, but 
> it wasn't until I stopped the nfsd, that the filesystems could actually 
> be unmounted, which seems to confirm Lon's theory about nfsd being the 
> likely candidate for holding the reference.
> 
> I found a note elsewhere on the web where someone worked around the 
> problem by stopping nfsd, stopping the service, restarting nfsd, and 
> relocating the service.  Disadvantage being that all nfs services 
> experience a minor interrupt at the time.
> 
> Anyway, my problem disappeared during the latest maintenance window. 
> Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> 
> nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so 
> I'm not 100% sure which of the two fixed it, and curious though I am, I 
> simply don't have the time to start reading the code.  If anyone has 
> further insights, I'd love to read about it, though.
> 
> Kind regards,
> 
> Herta

Someone reported off line that they are experiencing the same problem 
while running the same versions we currently are.

So just for completeness sake: expecting problems, I also upped the 
clumanager log levels during the last maintenance window.  They are now at:
    clumembd   loglevel="6"
    cluquorumd loglevel="6"
    clurmtabd  loglevel="7"
    clusvcmgrd loglevel="6"
    clulockd   loglevel="6"

Come to think of it, I probably loosened the log levels during the
maintenance window when our problems began (I wanted to reduce the size
of the logs).  Not sure how - or even if - this might affect things, though.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm