[Linux-cluster] sudden slow down of gfs2 volume
Dil Lee
dillee1 at gmail.com
Thu Nov 12 08:09:42 UTC 2015
Hi,
blktrace btt report for a 200m write to a IBM DS5020 SAN partition
==================== Per Process ====================
Q2Cdm MIN AVG MAX N
glock_workqueu 0.000165159 0.000165159 0.000165159
1 normal node
glock_workqueu 0.000229284 0.378688659 0.595789779
377 affected node
==================== Per Device ====================
Q2Cdm MIN AVG MAX N
(253, 8) 0.000165159 0.000165159 0.000165159
1 normal node
(253, 2) 0.000229284 0.370873383 0.595789779
385 affected node
Any insight what would possibly cause the long time in
"glock_workqueu" for the affected node? Thank you,
Regards,
Dil
On 11/11/15, Dil Lee <dillee1 at gmail.com> wrote:
> Thank you for your reply. Below are extra info.
> GFS2 partition in question:
> [root at apps03 ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/SAN5020-order 800G 435G 366G 55%
> /sanstorage/data0/images/order <- one that suffered from slow down
> /dev/mapper/SAN5020-SNPimage 3.0T 484G 2.6T 16%
> /sanstorage/data0/ShotNPrint/SNP_image
> /dev/mapper/SAN5020-data0 1.3T 828G 475G 64%
> /sanstorage/data0/images/album
>
> normal nodes, copying a 200MB file to affected GFS2 partition
> [root at apps01 ~]# time cp 200m.test /sanstorage/data0/images/order/
> real 0m1.073s
> user 0m0.008s
> sys 0m0.996s
>
> [root at apps08 ~]# time cp 200m.test /sanstorage/data0/images/order/
> real 0m5.046s
> user 0m0.004s
> sys 0m1.398s
>
> affected node, 1/30th the normal throughput atm.
> I cannot reproduced the extreme slow down due as the problem reccurs
> in strochastic manner.
> but this is not rare since this week.
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/images/order
> real 0m30.885s
> user 0m0.006s
> sys 0m6.348s
>
> affected node, writing to different export from the same SAN
> storage(an IBM DS5020)
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/ShotNPrint/SNP_image
> real 0m2.353s
> user 0m0.006s
> sys 0m2.033s
>
> [root at apps03 ~]# time cp 200m.test /sanstorage/data0/images/album
> real 0m2.319s
> user 0m0.010s
> sys 0m1.798s
>
> As the connection topology is single fibre pair connected to central
> SAN switch in a star topology,
> without any redundant path, if there are hardware issue
> SAN5020-SNPimage/SAN5020-data0 should be affected too,
> but that is not the case
>
More information about the Linux-cluster
mailing list