[Linux-cluster] sudden slow down of gfs2 volume

Tue Nov 10 10:23:57 UTC 2015

Hi,

On 10/11/15 04:56, Dil Lee wrote:
> Hi,
>
> I have a centos 6.5 cluster that are connected to a Fibre Channel SAN in star
> topology. All nodes/SAN_storages have single-pair fibre connection and
> no multipathing. Possibility of hardware issue had been eliminated
> because read/write between all other node/SAN_storage pairs works
> perfectly.
>
> Problem:
> Everything was running perfectly for years. Then node3 suddenly has
> very slow write to SAN_storage1, ~10KB/sec. Read speed seems to remain
> normal.
>
> Can anyone give be some pointers to debug the problem. Thank you.
>
> Dil
>

The usual things to look for are the load being asymetric across the 
nodes, the fileystem becoming full (although if the other nodes are 
still working at higher speed, that is less likely) an increase in the 
amount of cross-node invalidation due to locking, or some reason for the 
communications being slower to that node (i.e. packet loss, or similar)

One way to help debug this would be to look at the block device with 
blktrace and see if there are any obvious differences in latency of 
reads/writes between the faster and slower nodes, and whether that is 
down to access pattern or not.

So there are several possible things to follow up on to help narrow the 
issue down,

Steve.