[Linux-cluster] GFS 2 node hang in rm test

Patrick Caulfield pcaulfie at redhat.com
Tue Dec 7 09:38:10 UTC 2004


On Mon, Dec 06, 2004 at 04:13:50PM -0800, Daniel McNeil wrote:
> On Mon, 2004-12-06 at 11:45, Ken Preslan wrote:
> > On Fri, Dec 03, 2004 at 03:08:00PM -0800, Daniel McNeil wrote:
> 
> 
> Looking at the stack trace above and dissabling dlm.ko,
> it looks like dlm_lock+0x319 is the call to dlm_lock_stage1().
> looking at dlm_lock_stage1(), it looks like it is sleeping on
> 	 down_write(&rsb->res_lock)
> 
> So now I have to find who is holding the res_lock.

That's consistent with the hang you reported before - in fact it's almost
certainly the same thing. My guess is thet there is a dealock on res_lock
somewhere . In which case I suspect it's going to be easier to find that one by
reading code rather than running tests. res_lock should never be held for any
extended period of time, but in your last set of tracebacks there was nothing
obviously holding it - so I suspect something is sleeping with it.


Patrick




More information about the Linux-cluster mailing list