[Linux-cluster] GFS frozen again

Brett Cave brettcave at gmail.com
Mon Aug 18 16:06:02 UTC 2008


2008/8/18 Shawn Hood <shawnlhood at gmail.com>:
> Could you post the errors from syslog/dmesg?

as i was finishing off this email, i just noticed this from the logs
near the end of blade2:
Aug 17 19:50:24 blade2 gfs_controld[2839]: retrieve_plocks: ckpt open
error 12 cache1
That happens after blade2 has been fenced, has sucessfully rejoined
the fence and cman domains, and is now trying to mount gfs
filesystems. The first gfs file system it tries to mount causes a lock
up.

:) well, 1 node lost connectivity (blade2) to the cluster and was
fenced (fence_ilo) - the environment is HP bladesystem with x86_64
blades (2 intel, 1 amd). Here are logs from blade3:

Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] The token was lost in
the OPERATIONAL state.
Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).
Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 17 19:48:55 blade3 openais[2696]: [TOTEM] entering GATHER state from 2.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering GATHER state from 11.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Creating commit token
because I am the rep.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Saving state aru 1fd
high seq received 1fd
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Storing new sequence id
for ring 25c
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering COMMIT state.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] entering RECOVERY state.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] position [0] member
192.168.70.103:
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] previous ring seq 600
rep 192.168.70.102
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] aru 1fd high delivered
1fd received flag 1
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] position [1] member
192.168.70.104:
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] previous ring seq 600
rep 192.168.70.102
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] aru 1fd high delivered
1fd received flag 1
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Did not need to
originate any messages in recovery.
Aug 17 19:49:00 blade3 openais[2696]: [TOTEM] Sending initial ORF token
Aug 17 19:49:00 blade3 kernel: dlm: closing connection to node 2
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] New Configuration:
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] Members Left:
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] Members Joined:
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] New Configuration:
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] Members Left:
Aug 17 19:49:00 blade3 openais[2696]: [CLM  ] Members Joined:
Aug 17 19:49:00 blade3 openais[2696]: [SYNC ] This node is within the
primary component and will provide service.
Aug 17 19:49:01 blade3 openais[2696]: [TOTEM] entering OPERATIONAL state.
Aug 17 19:49:01 blade3 openais[2696]: [CLM  ] got nodejoin message
192.168.70.103
Aug 17 19:49:01 blade3 openais[2696]: [CLM  ] got nodejoin message
192.168.70.104
Aug 17 19:49:01 blade3 openais[2696]: [CPG  ] got joinlist message from node 4
Aug 17 19:49:01 blade3 openais[2696]: [CPG  ] got joinlist message from node 3
Aug 17 19:49:03 blade3 fenced[2712]: blade2 not a cluster member after
3 sec post_fail_delay
Aug 17 19:49:03 blade3 fenced[2712]: fencing node "blade2"
Aug 17 19:49:16 blade3 fenced[2712]: fence "blade2" success
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Trying to acquire journal lock...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Trying to acquire journal lock...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Looking at journal...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Looking at journal...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Acquiring the transaction lock...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Acquiring the transaction lock...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Replaying journal...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Replayed 0 of 1 blocks
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: replays = 0, skips = 0, sames = 1
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Replaying journal...
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1:
jid=2: Journal replayed in 1s
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:cache1.1: jid=2: Done
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Replayed 0 of 9 blocks
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: replays = 0, skips = 0, sames = 9
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1:
jid=2: Journal replayed in 1s
Aug 17 19:49:21 blade3 kernel: GFS: fsid=jemdevcluster:storage.1: jid=2: Done
Aug 17 19:49:30 blade3 openais[2696]: [CMAN ] lost contact with quorum device
Aug 17 19:49:30 blade3 openais[2696]: [CMAN ] quorum lost, blocking activity
Aug 17 19:49:30 blade3 kernel: dlm: closing connection to node 0
Aug 17 19:49:30 blade3 qdiskd[2765]: <info> Assuming master role
Aug 17 19:49:33 blade3 qdiskd[2765]: <notice> Writing eviction notice for node 2
Aug 17 19:49:33 blade3 openais[2696]: [CMAN ] quorum regained, resuming activity
Aug 17 19:49:36 blade3 qdiskd[2765]: <notice> Node 2 evicted
Aug 17 19:51:25 blade3 openais[2696]: [TOTEM] entering GATHER state from 11.
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Saving state aru 28 high
seq received 28
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Storing new sequence id
for ring 260
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering COMMIT state.
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering RECOVERY state.
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [0] member
192.168.70.102:
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604
rep 192.168.70.102
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 9 high delivered 9
received flag 1
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [1] member
192.168.70.103:
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604
rep 192.168.70.103
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 28 high delivered 28
received flag 1
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] position [2] member
192.168.70.104:
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] previous ring seq 604
rep 192.168.70.103
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] aru 28 high delivered 28
received flag 1
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] Did not need to
originate any messages in recovery.
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] New Configuration:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] Members Left:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] Members Joined:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] New Configuration:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] Members Left:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] Members Joined:
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:51:26 blade3 openais[2696]: [SYNC ] This node is within the
primary component and will provide service.
Aug 17 19:51:26 blade3 openais[2696]: [TOTEM] entering OPERATIONAL state.
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] got nodejoin message
192.168.70.102
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] got nodejoin message
192.168.70.103
Aug 17 19:51:26 blade3 openais[2696]: [CLM  ] got nodejoin message
192.168.70.104
Aug 17 19:51:26 blade3 openais[2696]: [CPG  ] got joinlist message from node 3
Aug 17 19:51:26 blade3 openais[2696]: [CPG  ] got joinlist message from node 4
Aug 17 19:51:44 blade3 kernel: dlm: connecting to 2
Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] The token was lost in
the OPERATIONAL state.
Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).
Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 18 17:16:03 blade3 openais[2696]: [TOTEM] entering GATHER state from 2.


blade2 logs are as follows:
19:46 - last log entry unrelated to gfs. system freezes.
19:50 - boot log entries - system has been fenced and is now starting up.
Aug 17 19:50:03 blade2 ccsd[2804]: Starting ccsd 2.0.73:
Aug 17 19:50:03 blade2 ccsd[2804]:  Built: Nov 12 2007 13:07:35
Aug 17 19:50:03 blade2 ccsd[2804]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Aug 17 19:50:03 blade2 ccsd[2804]: cluster.conf (cluster name =
jemdevcluster, version = 5) found.
Aug 17 19:50:03 blade2 ccsd[2804]: Remote copy of cluster.conf is from
quorate node.
Aug 17 19:50:03 blade2 ccsd[2804]:  Local version # : 5
Aug 17 19:50:03 blade2 ccsd[2804]:  Remote version #: 5
Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from
quorate node.
Aug 17 19:50:04 blade2 ccsd[2804]:  Local version # : 5
Aug 17 19:50:04 blade2 ccsd[2804]:  Remote version #: 5
Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from
quorate node.
Aug 17 19:50:04 blade2 ccsd[2804]:  Local version # : 5
Aug 17 19:50:04 blade2 ccsd[2804]:  Remote version #: 5
Aug 17 19:50:04 blade2 ccsd[2804]: Remote copy of cluster.conf is from
quorate node.
Aug 17 19:50:04 blade2 ccsd[2804]:  Local version # : 5
Aug 17 19:50:04 blade2 ccsd[2804]:  Remote version #: 5
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1358 version 0.80.3'
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Copyright (C) 2002-2006
MontaVista Software, Inc and contributors.
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] AIS Executive Service:
started and ready to provide service.
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Using default multicast
address of 239.192.24.76
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] openais component
openais_cpg loaded.
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais cluster closed process group service v1.01'
Aug 17 19:50:04 blade2 openais[2811]: [MAIN ] openais component
openais_cfg loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais configuration service'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_msg loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais message service B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_lck loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais distributed locking service B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_evt loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais event service B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_ckpt loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais checkpoint service B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_amf loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais availability management framework B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_clm loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais cluster membership service B.01.01'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_evs loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais extended virtual synchrony service'
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] openais component
openais_cman loaded.
Aug 17 19:50:05 blade2 openais[2811]: [MAIN ] Registering service
handler 'openais CMAN membership service 2.01'
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] Token Timeout (10000 ms)
retransmit timeout (495 ms)
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] token hold (386 ms)
retransmits before loss (20 retrans)
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] join (60 ms) send_join
(0 ms) consensus (4800 ms) merge (200 ms)
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] downcheck (1000 ms) fail
to recv const (50 msgs)
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] seqno unchanged const
(30 rotations) Maximum network MTU 1500
Aug 17 19:50:05 blade2 openais[2811]: [TOTEM] window size per rotation
(50 messages) maximum messages per rotation (17 messages)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] send threads (0 threads)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP token expired timeout (495 ms)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP token problem
counter (2000 ms)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP threshold (10 problem count)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] RRP mode set to none.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] heartbeat_failures_allowed (0)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] max_network_delay (50 ms)
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] HeartBeat is Disabled.
To enable set heartbeat_failures_allowed > 0
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Receive multicast socket
recv buffer size (262142 bytes).
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] The network interface
[192.168.70.102] is now up.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Created or loaded
sequence id 600.192.168.70.102 for this ring.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering GATHER state from 15.
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais extended virtual synchrony service'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais cluster membership service B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais availability management framework B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais checkpoint service B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais event service B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais distributed locking service B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais message service B.01.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais configuration service'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais cluster closed process group service v1.01'
Aug 17 19:50:06 blade2 openais[2811]: [SERV ] Initialising service
handler 'openais CMAN membership service 2.01'
Aug 17 19:50:06 blade2 openais[2811]: [CMAN ] CMAN 2.0.73 (built Nov
12 2007 13:07:39) started
Aug 17 19:50:06 blade2 openais[2811]: [SYNC ] Not using a virtual
synchrony filter.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Creating commit token
because I am the rep.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Saving state aru 0 high
seq received 0
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Storing new sequence id
for ring 25c
Aug 17 19:50:06 blade2 ccsd[2804]: Initial status:: Quorate
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering COMMIT state.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] entering RECOVERY state.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] position [0] member
192.168.70.102:
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] previous ring seq 600
rep 192.168.70.102
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] aru 0 high delivered 0
received flag 1
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Did not need to
originate any messages in recovery.
Aug 17 19:50:06 blade2 openais[2811]: [TOTEM] Sending initial ORF token
Aug 17 19:50:06 blade2 openais[2811]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] New Configuration:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] Members Left:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] Members Joined:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] New Configuration:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] Members Left:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] Members Joined:
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:50:07 blade2 openais[2811]: [SYNC ] This node is within the
primary component and will provide service.
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering OPERATIONAL state.
Aug 17 19:50:07 blade2 openais[2811]: [CLM  ] got nodejoin message
192.168.70.102
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering GATHER state from 11.
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Creating commit token
because I am the rep.
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Saving state aru 9 high
seq received 9
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] Storing new sequence id
for ring 260
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering COMMIT state.
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] entering RECOVERY state.
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] position [0] member
192.168.70.102:
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] previous ring seq 604
rep 192.168.70.102
Aug 17 19:50:07 blade2 openais[2811]: [TOTEM] aru 9 high delivered 9
received flag 1
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] position [1] member
192.168.70.103:
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] previous ring seq 604
rep 192.168.70.103
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] aru 28 high delivered 28
received flag 1
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] position [2] member
192.168.70.104:
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] previous ring seq 604
rep 192.168.70.103
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] aru 28 high delivered 28
received flag 1
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] Did not need to
originate any messages in recovery.
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] Sending initial ORF token
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] New Configuration:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] Members Left:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] Members Joined:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] New Configuration:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.102)
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] Members Left:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ] Members Joined:
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.103)
Aug 17 19:50:08 blade2 openais[2811]: [CLM  ]   r(0) ip(192.168.70.104)
Aug 17 19:50:08 blade2 openais[2811]: [SYNC ] This node is within the
primary component and will provide service.
Aug 17 19:50:08 blade2 openais[2811]: [TOTEM] entering OPERATIONAL state.
Aug 17 19:50:09 blade2 openais[2811]: [CMAN ] quorum regained, resuming activity
Aug 17 19:50:09 blade2 openais[2811]: [CLM  ] got nodejoin message
192.168.70.102
Aug 17 19:50:09 blade2 openais[2811]: [CLM  ] got nodejoin message
192.168.70.103
Aug 17 19:50:09 blade2 openais[2811]: [CLM  ] got nodejoin message
192.168.70.104
Aug 17 19:50:09 blade2 openais[2811]: [CPG  ] got joinlist message from node 3
Aug 17 19:50:09 blade2 openais[2811]: [CPG  ] got joinlist message from node 4
Aug 17 19:50:09 blade2 qdiskd[2877]: <info> Quorum Partition:
/dev/sda5 Label: jemqdisk
Aug 17 19:50:09 blade2 qdiskd[2878]: <info> Quorum Daemon Initializing
Aug 17 19:50:22 blade2 qdiskd[2878]: <info> Node 3 is the master
Aug 17 19:50:24 blade2 gfs_controld[2839]: retrieve_plocks: ckpt open
error 12 cache1
Aug 17 19:50:40 blade2 qdiskd[2878]: <info> Initial score 1/1
Aug 17 19:50:40 blade2 qdiskd[2878]: <info> Initialization complete
Aug 17 19:50:40 blade2 openais[2811]: [CMAN ] quorum device registered
Aug 17 19:50:40 blade2 qdiskd[2878]: <notice> Score sufficient for
master operation (1/1; required=1); upgrading

>
> Shawn
>
> On Mon, Aug 18, 2008 at 11:35 AM, Brett Cave <brettcave at gmail.com> wrote:
>>
>> GFS has frozen again - after reconfiguring and running GFS for almost
>> a month now, have not been able to get GFS running stably.
>>
>> [root at blade2 ~]# cat /etc/issue
>> CentOS release 5 (Final)
>> Kernel \r on an \m
>>
>> [root at blade2 ~]# uname -a
>> Linux blade2 2.6.18-53.el5 #1 SMP Mon Nov 12 02:14:55 EST 2007 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> [root at blade2 ~]# rpm -qa | grep gfs
>> gfs2-utils-0.1.38-1.el5
>> kmod-gfs-0.1.19-7.el5
>> gfs-utils-0.1.12-1.el5
>>
>> [root at blade2 ~]# modinfo gfs
>> filename:       /lib/modules/2.6.18-53.el5/extra/gfs/gfs.ko
>> license:        GPL
>> author:         Red Hat, Inc.
>> description:    Global File System 0.1.19-7.el5
>> srcversion:     18B81D3FD6ECDCCFA53D745
>> depends:        gfs2
>> vermagic:       2.6.18-53.el5 SMP mod_unload gcc-4.1
>>
>>
>> Is anyone actually running GFS on Centos5 stably? Was running gfs2,
>> but was also unstable, hence the move back to gfs.
>>
>> Setup: 3node cluster with 1 vote each and 1 quorum disk.
>> Each node has 1 x dual port hba connected to a fibra san (no
>> multipath, only single port on each card connected to SAN). SAN is
>> MSA1500. 2 GFS partitions, 1 qdisk partition on SAN.
>>
>> System runs fine for a few days, and then will notice that some
>> mountpoints become unavailable. The entire system locks up when this
>> happens, and the only option I have is to reset all nodes in the
>> cluster to start up the cluster again. no errors in logs, nothing out
>> of the ordinary that i can see.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Shawn Hood
> 910.670.1819 m
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list