[Linux-cluster] sudden slow down of gfs2 volume

Thu Nov 12 08:09:42 UTC 2015

Hi,
blktrace btt report for a 200m write to a IBM DS5020 SAN partition

==================== Per Process ====================
          Q2Cdm           MIN           AVG           MAX           N
glock_workqueu    0.000165159   0.000165159   0.000165159
1	normal node
glock_workqueu    0.000229284   0.378688659   0.595789779
377	affected node

==================== Per Device ====================
          Q2Cdm           MIN           AVG           MAX           N
(253,  8)         0.000165159   0.000165159   0.000165159
1	normal node
(253,  2)         0.000229284   0.370873383   0.595789779
385	affected node

Any insight what would possibly cause the long time in
"glock_workqueu" for the affected node? Thank you,
Regards,
Dil

On 11/11/15, Dil Lee <dillee1 at gmail.com> wrote:
> Thank you for your reply. Below are extra info.
> GFS2 partition in question:
> [root at apps03 ~]# df -h
> Filesystem                                Size  Used Avail Use% Mounted on
> /dev/mapper/SAN5020-order                 800G  435G  366G  55%
> /sanstorage/data0/images/order <- one that suffered from slow down
> /dev/mapper/SAN5020-SNPimage              3.0T  484G  2.6T  16%
> /sanstorage/data0/ShotNPrint/SNP_image
> /dev/mapper/SAN5020-data0                 1.3T  828G  475G  64%
> /sanstorage/data0/images/album
>
> normal nodes, copying a 200MB file to affected GFS2 partition
> [root at apps01 ~]# time cp 200m.test /sanstorage/data0/images/order/
> real    0m1.073s
> user    0m0.008s
> sys     0m0.996s
>
> [root at apps08 ~]# time cp 200m.test /sanstorage/data0/images/order/
> real    0m5.046s
> user    0m0.004s
> sys     0m1.398s
>
> affected node, 1/30th the normal throughput atm.
> I cannot reproduced the extreme slow down due as the problem reccurs
> in strochastic manner.
> but this is not rare since this week.
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/images/order
> real    0m30.885s
> user    0m0.006s
> sys     0m6.348s
>
> affected node, writing to different export from the same SAN
> storage(an IBM DS5020)
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/ShotNPrint/SNP_image
> real    0m2.353s
> user    0m0.006s
> sys     0m2.033s
>
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/images/album
> real    0m2.319s
> user    0m0.010s
> sys     0m1.798s
>
> As the connection topology is single fibre pair connected to central
> SAN switch in a star topology,
> without any redundant path, if there are hardware issue
> SAN5020-SNPimage/SAN5020-data0 should be affected too,
> but that is not the case
>