[Linux-cluster] mount hang in kcl_join_service

Daniel McNeil daniel at osdl.org
Thu Feb 17 19:11:04 UTC 2005


On Wed, 2005-02-16 at 22:26, David Teigland wrote:
> On Wed, Feb 16, 2005 at 03:39:37PM -0800, Daniel McNeil wrote:
> > I have not been able to get my tests to run for more than
> > 1 day for the last several tries.  This time my test hung
> > during mount in kcl_join_service().  My test does mount and umount 
> > several times for each test run.  This time it hung on the
> > 22nd test run.  It looks like it was starting a 3node test
> > where a gfs file system is mounted on all 3 nodes and then
> > does a umount/mount 1 node at a time.  So this should have
> > done an umount on cl031 and then hung on a mount on cl031
> > with cl030 and cl032 having the gfs file system still mounted.
> 
> > A bunch of info is available here:
> > http://developer.osdl.org/daniel/GFS/test.11feb2005/
> 
> I've looked through it and can't pinpoint the problem.  Next
> time could you also collect /proc/cluster/lock_dlm/debug and
> /proc/cluster/dlm_debug ?
> 
> I've set up a similar but simplified test on both of my test
> clusters (a 2-node and a 7-node).  I can't dedicate these
> machines for a full 1-2 day stretch this until the weekend,
> though.  My test is a loop around:
> 
> - on each node sequentially: unmount/mount gfs
> - on each node sequentially: run some load for a couple minutes

I started in running again yesterday afternoon.  I'll collect
all the info when I hit a problem.  I still have not
made it past 52 hours running this test.

Thanks for taking a look,

Daniel




More information about the Linux-cluster mailing list