[Linux-cluster] gfs2.fsck bug

Wed Dec 22 17:27:35 UTC 2010

hi,

our gfs2 datasets are down; when i try to do a mount i get:

[root at DBT1 ~]# mount -a
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm

our cluster.conf is consistent across all devices (listed below).

so i thought an fsck would fix this, then i get:

[root at DBT1 ~]# fsck.gfs2 -fnp /dev/NEWvg/NEWlvTemp
(snippage)
RG #4909212 (0x4ae89c) free count inconsistent: is 16846 should be 17157
Resource group counts updated
Unlinked block 8639983 (0x83d5ef) bitmap fixed.
RG #8639976 (0x83d5e8) free count inconsistent: is 65411 should be 65412
Inode count inconsistent: is 20 should be 19
Resource group counts updated
Pass5 complete
The statfs file is wrong:

Current statfs values:
blocks:  43324224 (0x2951340)
free:    38433917 (0x24a747d)
dinodes: 21085 (0x525d)

Calculated statfs values:
blocks:  43324224 (0x2951340)
free:    38466752 (0x24af4c0)
dinodes: 21083 (0x525b)
The statfs file was fixed.

gfs2_fsck: bad write: Bad file descriptor on line 44 of file buf.c

i read in https://bugzilla.redhat.com/show_bug.cgi?id=457557 that there 
is some way of fixing this with gfs2_edit - are there docs available?

as we've been having fencing issues, i removed two servers (DBT2/DBT3) 
from the cluster fencing, and they are not active at this time.  would 
this cause the mount issues?

tia for any advice / guidance.

yvette

our cluster.conf:

<?xml version="1.0"?>
<cluster alias="DBT0_DBT1_HA" config_version="85" name="DBT0_DBT1_HA">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="1"/>
	<clusternodes>
		<clusternode name="DBT0" nodeid="1" votes="3">
			<fence>
				<method name="1">
					<device name="DBT0_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DBT1" nodeid="2" votes="3">
			<fence>
				<method name="1">
					<device name="DBT1_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DEV" nodeid="3" votes="3">
			<fence>
				<method name="1">
					<device name="DEV_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DBT2" nodeid="4" votes="1">
			<fence>
				<method name="1"/>
			</fence>
		</clusternode>
		<clusternode name="DBT3" nodeid="5" votes="1">
			<fence>
				<method name="1"/>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_ilo" hostname="192.168.200.140" login="foo" 
name="DBT0_ILO2" passwd="foo"/>
		<fencedevice agent="fence_ilo" hostname="192.168.200.150" login="foo" 
name="DEV_ILO2" passwd="foo"/>
		<fencedevice agent="fence_ilo" hostname="192.168.200.141" login="foo" 
name="DBT1_ILO2" passwd="foo"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources>
			<clusterfs device="/dev/foo0vg/foo0vol002" force_unmount="1" 
fsid="19150" fstype="gfs2" mountpoint="/foo0vol002" name="foo0vol002" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0lvvol003" force_unmount="1" 
fsid="51633" fstype="gfs2" mountpoint="/foo0vol003" name="foo0vol003" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0lvvol004" force_unmount="1" 
fsid="36294" fstype="gfs2" mountpoint="/foo0vol004" name="foo0vol004" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0vol005" force_unmount="1" 
fsid="48920" fstype="gfs2" mountpoint="/foo0vol005" name="foo0vol005" 
options="noatime,noquota,data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo1vg/foo1lvvol000" force_unmount="1" 
fsid="24235" fstype="gfs2" mountpoint="/foo0vol000" name="foo0vol000" 
options="data=ordered" self_fence="0"/>
			<clusterfs device="/dev/foo1vg/foo1lvvol001" force_unmount="1" 
fsid="34088" fstype="gfs2" mountpoint="/foo0vol001" name="foo0vol001" 
options="data=ordered" self_fence="0"/>
		</resources>
	</rm>
	<totem consensus="4800" join="60" token="10000" 
token_retransmits_before_loss_const="20"/>
	<dlm plock_ownership="1" plock_rate_limit="0"/>
	<gfs_controld plock_rate_limit="0"/>
</cluster>