[Linux-cluster] GFS + DRBD Problems

Mon Mar 3 11:23:36 UTC 2008

Hi,

I'm appear to be a experiencing a strange compound problem with this, that 
is proving rather difficult to troubleshoot, so I'm hoping someone here 
can spot a problem I hadn't.

I have a 2-node cluster with Open Shared Root on GFS on DRBD. A single 
node mounts GFS OK and works, but after a while seems to just block for 
disk. Very much as if it started trying to fence the other node and is 
waiting for acknowledgement. There are no fence devices defined (so this 
could be a possibility), but the other node was never powered up in the 
first place, so it is somewhat beyond me why it might suddenly decide to 
try to fence it. This usually happens after a period of idleness. If the 
node is used, this doesn't seem to happen, but leaving it along for half 
an hour causes it to block for disk I/O.

Unfortunately, it doesn't end there. When an attempt is made to dual-mount 
the GFS file system before the secondary is fully up to date (but is 
connected and syncing), the 2nd node to join notices an inconsistency, and 
withdraws from the cluster. In the process, GFS gets corrupted, and the 
only way to get it to mount again on either node is to repair it with 
fsck.

I'm not sure if this is a problem with my cluster setup or not, but I 
cannot see that the nodes would fail to find each other and get DLM 
working. Console logs seem to indicate that everything is in fact OK, and 
the nodes are connected directly via a cross-over cable.

If the nodes are in sync by the time GFS tries to mount, the mount succeeds, 
but everything grinds to a halt shortly afterwards - so much so that the only 
way to get things moving again is to hard-reset one of the nodes, preferably 
the 2nd one to join.

Here is where the second thing that seems wrong happend - the first node 
doesn't just lock-up at this point, as one might expect (when a connected node 
disappears, e.g. due to a hard reset, cluster is supposed to try to fence it 
until it cleanly rejoins - and it can't possibly fence the other node since I 
haven't configured any fencing devices yet). This doesn't seem to happen. The 
first node seems to continue like nothing happened. This is possibly connected 
to the fact that by this point, GFS is corrupted and has to be fsck-ed at next 
boot. This part may be a cluster setup issue, so I'll raise that on the cluster 
list, although it seems to be a DRBD specific peculiarity - using a SAN doesn't 
have this issue with a nearly identical cluster.conf (only difference being the 
block device specification).

The cluster.conf is as follows:
<?xml version="1.0"?>
<cluster config_version="18" name="sentinel">
         <cman two_node="1" expected_votes="1"/>
         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
                 <clusternode name="sentinel1c" nodeid="1" votes="1">
                         <com_info>
                                 <rootsource name="drbd"/>
                                 <!--<chrootenv  mountpoint      = "/var/comoonics/chroot"
                                                 fstype          = "ext3"
                                                 device          = "/dev/sda2"
                                                 chrootdir       = "/var/comoonics/chroot"
                                 />-->
                                 <syslog name="localhost"/>
                                 <rootvolume     name            = "/dev/drbd1"
                                                 mountopts       = "noatime,nodiratime,noquota"
                                 />
                                 <eth    name    = "eth0"
                                         ip      = "10.0.0.1"
                                         mac     = "00:0B:DB:92:C5:E1"
                                         mask    = "255.255.255.0"
                                         gateway = ""
                                 />
                                 <fenceackserver user    = "root"
                                                 passwd  = "secret"
                                 />
                         </com_info>
                         <fence>
                                 <method name="1"/>
                         </fence>
                 </clusternode>
                 <clusternode name="sentinel2c" nodeid="2" votes="1">
                         <com_info>
                                 <rootsource name="drbd"/>
                                 <!--<chrootenv  mountpoint      = "/var/comoonics/chroot"
                                                 fstype          = "ext3"
                                                 device          = "/dev/sda2"
                                                 chrootdir       = "/var/comoonics/chroot"
                                 />-->
                                 <syslog name="localhost"/>
                                 <rootvolume     name            = "/dev/drbd1"
                                                 mountopts       = "noatime,nodiratime,noquota"
                                 />
                                 <eth    name    = "eth0"
                                         ip      = "10.0.0.2"
                                         mac     = "00:0B:DB:90:4E:1B"
                                         mask    = "255.255.255.0"
                                         gateway = ""
                                 />
                                 <fenceackserver user    = "root"
                                                 passwd  = "secret"
                                 />
                         </com_info>
                         <fence>
                                 <method name="1"/>
                         </fence>
                 </clusternode>
         </clusternodes>
         <cman/>
         <fencedevices/>
         <rm>
                 <failoverdomains/>
                 <resources/>
         </rm>
</cluster>

Getting to the logs can be a bit difficult with OSR (they get reset on 
reboot, and it's rather difficult getting to them when the node stops 
responding without rebooting it), so I don't have those at the moment.

Any suggestions would be welcome at this point.

TIA.

Gordan