[Linux-cluster] Fwd: GFS volume hangs on 3 nodes after gfs_grow

Fri Sep 26 14:43:39 UTC 2008

This is worse than I tought. The entire cluster is hanging upon restart
command issued from the Conga - lucy box. I tried bringing gfs service down
on node2 (lucy) with the: service gfs stop (we are not running rgmanager),
and I got:
FATAL: Module gfs is in use.

On node3:
service gfs status:
Configured GFS mountpoints:
/lvm_test1
/lvm_test2
Active GFS mountpoints:
/lvm_test1
/lvm_test2

service gfs stop:
Unmounting GFS filesystems:  (hangs)

node2 - .175
node3 - .78
node4 - .79
All nodes are configured on the same segment.

These are the messages from the node3 once from the point I tried to restart
the cluster:
Sep 26 09:00:38 dev03 openais[8692]: [TOTEM] entering GATHER state from 12.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering GATHER state from 0.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Creating commit token because I
am the rep.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Saving state aru 1e1 high seq
received 1e1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Storing new sequence id for
ring 454
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [0] member
xxx.xxx.xxx.78:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep
xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1
received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [1] member
xxx.xxx.xxx.175:
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep
xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1
received flag 1
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Did not need to originate any
messages in recovery.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 kernel: dlm: closing connection to node 2
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)

Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] New Configuration:
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)

Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Left:
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] Members Joined:
Sep 26 09:00:43 dev03 openais[8692]: [SYNC ] This node is within the primary
component and will provide service.
Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] got nodejoin message
xxx.xxx.xxx.78
Sep 26 09:00:43 dev03 openais[8692]: [CLM  ] got nodejoin message
xxx.xxx.xxx.175
Sep 26 09:00:43 dev03 openais[8692]: [CPG  ] got joinlist message from node
3
Sep 26 09:00:43 dev03 fenced[8710]: fencing deferred to
fenmrdev02.maritz.com
Sep 26 09:00:43 dev03 openais[8692]: [CPG  ] got joinlist message from node
1
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1:
Trying to acquire journal lock...
Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Busy
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering GATHER state from 11.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Creating commit token because I
am the rep.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Saving state aru 31 high seq
received 31
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Storing new sequence id for
ring 458
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering COMMIT state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering RECOVERY state.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [0] member
xxx.xxx.xxx.78:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep
xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31
received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [1] member
xxx.xxx.xxx.79:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep
xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 9 high delivered 9 received
flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [2] member
xxx.xxx.xxx.175:
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep
xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31
received flag 1
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Did not need to originate any
messages in recovery.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Sending initial ORF token
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] CLM CONFIGURATION CHANGE
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] New Configuration:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.78)
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.175)

Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Left:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] Members Joined:
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ]       r(0) ip(xxx.xxx.xxx.79)
Sep 26 09:02:37 dev03 openais[8692]: [SYNC ] This node is within the primary
component and will provide service.
Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state.
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message
xxx.xxx.xxx.78
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message
xxx.xxx.xxx.79
Sep 26 09:02:37 dev03 openais[8692]: [CLM  ] got nodejoin message
xxx.xxx.xxx.175
Sep 26 09:02:37 dev03 openais[8692]: [CPG  ] got joinlist message from node
3
Sep 26 09:02:37 dev03 openais[8692]: [CPG  ] got joinlist message from node
1
Sep 26 09:02:43 dev03 kernel: dlm: connecting to 2

---------- Forwarded message ----------
From: Alan A <alan.zg at gmail.com>
Date: Thu, Sep 25, 2008 at 2:04 PM
Subject: GFS volume hangs on 3 nodes after gfs_grow
To: linux clustering <linux-cluster at redhat.com>

Hi all!

I have 3 node test cluster utilizing SCSI fencing and GFS. I have made 2 GFS
Logical Volumes - lvm1 and lvm2, both utilizing 5GB on 10GB disks. Testing
the command line tools I did lvextend -L +1G /devicename to bring lvm2 to
6GB. This went fine without any problems. Then I issued command gfs_grow
/mountpoint and the volume became inaccessible. Any command trying to access
the volume hangs, and umount returns: /sbin/umount.gfs: /lvm2: device is
busy.

Few questions - Since I have two volumes on this cluster and the lvm1 works
just fine, would there be any suggestions to unmounting lvm2 in order to try
and fix it?
Is gfs_grow - bug free or not (use/do not use)?
Is there any other way besides restarting the cluster/ nodes to get lvm2
back in operational state?
-- 
Alan A.

-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080926/f48cbb06/attachment.htm>