[Linux-cluster] SOS: A node does not properly invoke "lock_gulmd: shutdown" during reboot

Sun Feb 24 20:29:29 UTC 2008

Hi,

I would appreciate it if some one could answer this
question.

We have three nodes in a GFS cluster:

NODE1: GFS-6.0.2-25
NODE2: GFS-6.0.2.36-1
NODE3: GFS-6.0.2.36-1

NODE2 does not always invoke "lock_gulmd: shutdown"
during every reboot.  When it does not, NODE1 (master)
will think NODE2 misses heartbeats and then fence
NODE2 because NODE1 does not know NODE2 is rebooting.

Please see the kernel log on NODE2 below.  The first
part of the log shows that NODE2 does not invoke
"lock_gulmd: shutdown" during the first reboot (then
it got fenced by NODE1).  The second part shows that
NODE2 invokes "lock_gulmd: shutdown" properly during
the second reboot (then NODE2 can re-join the cluster
automatically after bootup).

First part:

Feb 23 16:59:12 NODE2 crond: crond shutdown succeeded
Feb 23 16:59:12 NODE2 lock_gulmd_core[6633]: "GFS
Kernel Interface" is logged out. fd:9 
Feb 23 16:59:12 NODE2 gfs: Unmounting GFS filesystems:
 succeeded
Feb 23 16:59:14 NODE2 ccsd[6614]: Stopping ccsd,
SIGTERM received. 	
Feb 23 16:59:14 NODE2 ccsd: Stopping ccsd:
Feb 23 16:59:16 NODE2 ccsd: shutdown succeeded
Feb 23 16:59:16 NODE2 ccsd: _[60G
Feb 23 16:59:16 NODE2 ccsd: 
Feb 23 16:59:16 NODE2 rc: Stopping ccsd:  succeeded
Feb 23 16:59:16 NODE2 dd: 1+0 records in
Feb 23 16:59:16 NODE2 dd: 1+0 records out
Feb 23 16:59:16 NODE2 random: Saving random seed: 
succeeded
Feb 23 16:59:16 NODE2 mdmonitor: mdadm shutdown
succeeded
Feb 23 16:59:16 NODE2 pool: Stopping pool pool_ccs:
Feb 23 16:59:16 NODE2 pool: shutdown succeeded
Feb 23 16:59:16 NODE2 pool: shutdown succeeded
Feb 23 16:59:16 NODE2 pool: 
Feb 23 16:59:16 NODE2 pool: 
Feb 23 16:59:16 NODE2 rc: Stopping pool:  succeeded

Second part:

Feb 23 17:07:40 NODE2 crond: crond shutdown succeeded
Feb 23 17:07:41 NODE2 lock_gulmd_core[5189]: "GFS
Kernel Interface" is logged out. fd:9 
Feb 23 17:07:41 NODE2 gfs: Unmounting GFS filesystems:
 succeeded
Feb 23 17:07:43 NODE2 lock_gulmd: Checking for Gulm
Services...
Feb 23 17:07:43 NODE2 lock_gulmd: Stopping lock_gulmd:
Feb 23 17:07:44 NODE2 lock_gulmd: shutdown succeeded
Feb 23 17:07:44 NODE2 lock_gulmd: _[60G
Feb 23 17:07:44 NODE2 lock_gulmd: 
Feb 23 17:07:44 NODE2 rc: Stopping lock_gulmd: 
succeeded
Feb 23 17:07:44 NODE2 ccsd[5149]: Stopping ccsd,
SIGTERM received. 
Feb 23 17:07:44 NODE2 ccsd: Stopping ccsd:
Feb 23 17:07:46 NODE2 ccsd: shutdown succeeded
Feb 23 17:07:46 NODE2 ccsd: _[60G
Feb 23 17:07:46 NODE2 ccsd: 
Feb 23 17:07:46 NODE2 rc: Stopping ccsd:  succeeded
Feb 23 17:07:46 NODE2 dd: 1+0 records in
Feb 23 17:07:46 NODE2 dd: 1+0 records out
Feb 23 17:07:46 NODE2 random: Saving random seed: 
succeeded
Feb 23 17:07:46 NODE2 mdmonitor: mdadm shutdown
succeeded
Feb 23 17:07:46 NODE2 pool: Stopping pool pool_ccs:
Feb 23 17:07:46 NODE2 pool: shutdown succeeded
Feb 23 17:07:46 NODE2 pool: shutdown succeeded
Feb 23 17:07:46 NODE2 pool: 
Feb 23 17:07:46 NODE2 pool: 
Feb 23 17:07:46 NODE2 rc: Stopping pool:  succeeded

The question is what could be the reason for NODE2 not
to invoke "lock_gulmd: shutdown" during a reboot?

Best regards,
BJ
--

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ