[Linux-cluster] sudden slow down of gfs2 volume

Thu Nov 12 13:21:16 UTC 2015

----- Original Message -----
> Hi,
> blktrace btt report for a 200m write to a IBM DS5020 SAN partition
> 
> ==================== Per Process ====================
>           Q2Cdm           MIN           AVG           MAX           N
> glock_workqueu    0.000165159   0.000165159   0.000165159
> 1	normal node
> glock_workqueu    0.000229284   0.378688659   0.595789779
> 377	affected node
> 
> ==================== Per Device ====================
>           Q2Cdm           MIN           AVG           MAX           N
> (253,  8)         0.000165159   0.000165159   0.000165159
> 1	normal node
> (253,  2)         0.000229284   0.370873383   0.595789779
> 385	affected node
> 
> Any insight what would possibly cause the long time in
> "glock_workqueu" for the affected node? Thank you,
> Regards,
> Dil

Hi Dil,

It's hard to say what's causing this, but I would start by running the
"glocktop" tool while recreating the slowdown. Perhaps it will give you
some insight. The tool may be downloaded in source code here:
http://people.redhat.com/rpeterso/Tools/glocktop.c
Instructions on compiling and running are in comments at the top.

Are you saying that the file system runs fast from one node and slow
from another node in the cluster? If that's the case, I would
do this experiment:

1. Unmount the file system from both nodes in the cluster.
2. Mount the file system from the "slow" node.
3. Mount the file system from the "fast" node.
4. Perform the speed test. Is the fast node now slow, and the slow node fast?

If the slow node is still slow after it is the first node to mount,
the problem may be due to multipath not using the "primary" path for IO.

Regards,

Bob Peterson
Red Hat File Systems