[Linux-cluster] gfs2 mount hangs

Fri Aug 14 14:57:38 UTC 2009

On Fri, Aug 14, 2009 at 10:30:17AM +0800, Wengang Wang wrote:
> anything else needed?

versions are good

> #don't know where 239.192.110.55 comes from. does it matter?

That's good, the multicast address is generated by cman.

> [root at desk ~]# cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M     88   2009-08-14 17:44:00  cool
>    2   M     76   2009-08-14 17:43:52  desk
> 
> [root at cool ~]# cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M     84   2009-08-14 09:46:06  cool
>    2   M     88   2009-08-14 09:46:06  desk
> 

> [root at desk ~]# group_tool
> groupd not running
> fence domain
> member count  2
> victim count  0
> victim now    0
> master nodeid 2
> wait state    none
> members       1 2
> 
> [root at cool ~]# group_tool
> groupd not running
> fence domain
> member count  2
> victim count  0
> victim now    0
> master nodeid 2
> wait state    none
> members       1 2

the cluster is fine

> #checking for difference, seems only the group_tool has different 
> output. problem is in groupd? it starts automatically? I didn't start it 
> by hand. what I do is "service cman start" on both nodes and then "mount 
> ...." on both nodes.

groupd is not needed

> node desk:
> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1

There's a problem here, but it's not clear what has gone wrong.  Could you try
this again and after these messages appear send the output of
"gfs_control dump" from both nodes?

> Aug 14 10:14:00 cool kernel: INFO: task mount.gfs2:2458 blocked for more 
> than 120 seconds.

The second mount is stuck probably because the first went bad.

Dave