[Cluster-devel] [GFS2 PATCH 2/2] GFS2: Split gfs2_rgrp_congested into inter-node and intra-node cases
Steven Whitehouse
swhiteho at redhat.com
Thu Jan 25 11:47:04 UTC 2018
Hi,
Some further thoughts...
Whenever we find a problem related to a lock, it is a good plan to
understand where the problem actually lies. In other words whether the
locking itself is slow, or whether it is some action that is being
performed under the lock that is the issue. We have the ability to
easily create histograms of DLM lock times, and almost as easily create
histograms of the glock times (gfs2_glock_queue -> gfs2_promote). We can
easily filter on glock type (rgrp) and the lock transistions that we
care about (any -> EX) too. So it would be interesting to look at this
in order to get more of an insight into what is really going on.
Taking the raw histogram and multiplying the count by the centre of each
bucket gives us total time taken for each different lock latency. Then
it is easy to see which latencies are the ones causing the most delay.
It would also be interesting to know how long it takes to allocate and
deallocate a block. What are the operations that take the most time?
Unfortunately our block allocation tracepoint doesn't give us that info,
but it is probably not that tricky to alter it, so that it does.
That would give us a much more detailed picture of what is going on I think,
Steve.
More information about the Cluster-devel
mailing list