[Linux-cluster] sudden slow down of gfs2 volume

Tue Nov 10 13:29:12 UTC 2015

----- Original Message -----
> Hi,
> 
> I have a centos 6.5 cluster that are connected to a Fibre Channel SAN in star
> topology. All nodes/SAN_storages have single-pair fibre connection and
> no multipathing. Possibility of hardware issue had been eliminated
> because read/write between all other node/SAN_storage pairs works
> perfectly.
> 
> Problem:
> Everything was running perfectly for years. Then node3 suddenly has
> very slow write to SAN_storage1, ~10KB/sec. Read speed seems to remain
> normal.
> 
> Can anyone give be some pointers to debug the problem. Thank you.
> 
> Dil

Hi Dil,

The first thing I would suspect is that the file system is running low on
free blocks. GFS2 starts to struggle when a file system has too few
blocks for new allocations. If the file system has a small resource group
size, it may still look like you've got a lot of free blocks when this happens.
The solution, of course, is to use a bigger file system with more free space.
You can use lvresize then gfs2_grow to make the file system bigger, but
you may want to consider copying the data to a new device that's bigger,
simply to reduce file system fragmentation (as I'm about to explain).

The second thing I would suspect is file system fragmentation. When GFS2
file systems get too fragmented over time, the gfs2 block allocator runs
into the same problem: It can find free blocks, but not a long enough
continuous run of them to satisfy its "ideal" conditions.

Unfortunately, there's no defrag tool for GFS2, so you'd just have to copy
the data to a new file system (with a single process only), which ought to
minimize the fragmentation in the new copy.

There may be lots of causes for GFS2 slowing down (such as faulty routers),
and each has a separate thing to diagnose and debug, but these are probably
the top two.

Regards,

Bob Peterson
Red Hat File Systems