[Linux-cluster] GFS2 mount hangs for some disks

Tue Jan 5 20:49:35 UTC 2016

----- Original Message -----
> Hi list,
> 
> I have some problems with GFS2 with failed nodes. After one of the
> cluster nodes fenced and rebooted, it cannot mount some of the gfs2
> file systems but hangs on the mount operation. No output. I've waited
> nearly 10 minutes to mount single disk but it didn't respond. Only
> solution is to shutdown all nodes and clean start of the cluster. I'm
> suspecting journal size or file system quotas.
> 
> I have 8-node rhel-6 cluster with GFS2 formatted disks which are all
> mounted by all nodes.
> There are two types of disk:
>      Type A :
>          ~50 GB disk capacity
>          8 journal with size 512MB
>          block-size: 1024
>          very small files (Avg: 50 byte - sym.links)
>          ~500.000 file (inode)
>          Usage: 10%
>          Nearly no write IO (under 1000 file per day)
>          No user quota (quota=off)
>          Mount options: async,quota=off,nodiratime,noatime
> 
>      Tybe B :
>          ~1 TB disk capacity
>          8 journal with size 512MB
>          block-size: 4096
>          relatively small files (Avg: 20 KB)
>          ~5.000.000 file (inode)
>          Usage: 20%
>          write IO ~50.000 file per day
>          user quota is on (some of the users exceeded quota)
>          Mount options: async,quota=on,nodiratime,noatime
> 
> To improve performance, I set journal size to 512 MB instead of 128 MB
> default. All disk are connected with fiber from SAN Storage. All disk
> on cluster LVM. All nodes connected to each other with private
> Gb-switch.
> 
> For example, after "node5" failed and fenced, it can re-enter the
> cluster. When i try "service gfs2 start", it can mount "Type A" disks,
> but hangs on the first "Tybe B" disk. Logs hangs on the "Trying to
> join cluster lock_dlm" message:
> 
>      ...
>      Jan 05 00:01:52 node5 lvm[4090]: Found volume group "VG_of_TYPE_A"
>      Jan 05 00:01:52 node5 lvm[4119]: Activated 2 logical volumes in
> volume group VG_of_TYPE_A
>      Jan 05 00:01:52 node5 lvm[4119]: 2 logical volume(s) in volume
> group "VG_of_TYPE_A" now active
>      Jan 05 00:01:52 node5 lvm[4119]: Wiping internal VG cache
>      Jan 05 00:02:26 node5 kernel: Slow work thread pool: Starting up
>      Jan 05 00:02:26 node5 kernel: Slow work thread pool: Ready
>      Jan 05 00:02:26 node5 kernel: GFS2 (built Dec 12 2014 16:06:57)
>      installed
>      Jan 05 00:02:26 node5 kernel: GFS2: fsid=: Trying to join cluster
> "lock_dlm", "TESTCLS:typeA1"
>      Jan 05 00:02:26 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: Joined
> cluster. Now mounting FS...
>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5,
> already locked for use
>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5:
> Looking at journal...
>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5: Done
>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=: Trying to join cluster
> "lock_dlm", "TESTCLS:typeA2"
>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: Joined
> cluster. Now mounting FS...
>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5,
> already locked for use
>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5:
> Looking at journal...
>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5: Done
>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=: Trying to join cluster
> "lock_dlm", "TESTCLS:typeB1"
> 
> 
> I've waited nearly 10 minutes in this state without respond or log. In
> this state, I cannot do `ls` in another nodes for this file system.
> Any idea of the cause of the problem? How is the cluster affected by
> journal size or count?
> --
> B.Baransel BAĞCI
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi,

If mount hangs, it's hard to say what it's doing. It could be waiting
for a dlm lock, which is waiting on a pending fencing option.

There have been occasional hangs discovered in journal replay, but not
for a long time. It's less likely. What kernel version is this?
December 12, 2014 is more than a year old, so it might be something
we've already found and fixed. If this is RHEL6 or Centos6 or similar,
you could try catting the /proc/<pid>/stack file of the mount helper
process, aka mount.gfs2 and see what it's doing.

Normally, dlm recovery and gfs2 recovery take only a few seconds time.
The size of journals and number of journals will likely have no effect.

If I was a betting man, I'd bet that GFS2 is waiting for DLM, and
DLM is waiting for a fence operation to be completed successfully
before continuing. If this is rhel6 or earlier, you could do
"group_tool dump" to find out if the cluster membership is sane or
if it's waiting for something like this.

Regards,

Bob Peterson
Red Hat File Systems