[Linux-cluster] gfs2 mount hangs

Tue Aug 18 02:32:54 UTC 2009

Hi Dave,

> 
>> node desk:
>> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
>> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
>> Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
> 
> There's a problem here, but it's not clear what has gone wrong.  Could you try
> this again and after these messages appear send the output of
> "gfs_control dump" from both nodes?
> 
I got the outputs:
1) for node cool(the second mount is on this node):

[root at cool ~]# gfs_control dump
1250560388 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1250560388 gfs_controld 3.0.0 started
1250560389 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1250560389 group_mode 3 compat 0
1250560389 setup_cpg 13
1250560389 set_protocol member_count 1 propose daemon 1.1.1 kernel 1.1.1
1250560389 run protocol from nodeid 1
1250560389 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1250561975 client connection 5 fd 15
1250561975 join: /gfs2 gfs2 lock_dlm testgfs2:1 rw /dev/sdb
1250561975 1 join: cluster name matches: testgfs2
1250561975 1 process_dlmcontrol register 0
1250561975 1 add_change cg 1 joined nodeid 1
1250561975 1 add_change cg 1 we joined
1250561975 1 add_change cg 1 counts member 1 joined 1 remove 0 failed 0
1250561975 1 wait_conditions skip for zero started_count
1250561975 1 send_start cg 1 id_count 1 om 0 nm 1 oj 0 nj 0
1250561975 1 receive_start 1:1 len 92
1250561975 1 match_change 1:1 matches cg 1
1250561975 1 wait_messages cg 1 got all 1
1250561975 1 pick_first_recovery_master low 1 old 0
1250561975 1 sync_state all_nodes_new first_recovery_needed master 1
1250561975 1 create_old_nodes all new
1250561975 1 create_new_nodes 1 ro 0 spect 0
1250561975 1 create_failed_journals all new
1250561975 1 create_new_journals 1 gets jid 0
1250561975 1 apply_recovery first start_kernel
1250561975 1 start_kernel cg 1 member_count 1
1250561975 1 set /sys/fs/gfs2/testgfs2:1/lock_module/block to 0
1250561975 1 set open /sys/fs/gfs2/testgfs2:1/lock_module/block error -1 2
1250561975 1 client_reply_join_full ci 5 result 0 
hostdata=jid=0:id=2032068419:first=1
1250561975 client_reply_join 1 ci 5 result 0
1250561975 uevent add gfs2 /fs/gfs2/testgfs2:1
1250561975 1 ping_kernel_mount

there is a line with "...error -1 2", but seems the line is out during 
the second(hang) mount.

2) for node desk(the first mount is on this node).
[root at desk ~]# gfs_control dump
1250589068 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1250589068 gfs_controld 3.0.0 started
1250589068 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1250589068 group_mode 3 compat 0
1250589068 setup_cpg 13
1250589068 run protocol from nodeid 1
1250589068 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1250590628 uevent add gfs2 /fs/gfs2/testgfs2:1
1250590628 uevent change gfs2 /fs/gfs2/testgfs2:1
1250590628 recovery_uevent mg not found 1
1250590628 uevent change gfs2 /fs/gfs2/testgfs2:1
1250590628 recovery_uevent mg not found 1
1250590628 uevent change gfs2 /fs/gfs2/testgfs2:1
1250590628 recovery_uevent mg not found 1
1250590628 uevent online gfs2 /fs/gfs2/testgfs2:1

the log looks be with no much info, can it help us?

regards,
wengang.