[sos-devel] New plugin "gfs2.py" for collecting lockdumps for DLM, GFS2

Bryn M. Reeves bmr at redhat.com
Thu Oct 31 11:16:30 UTC 2013


On Mon, 26 Aug 2013 15:44:38, Shane Bradley wrote:
> This plugin "gfs2.py" will capture GFS2 and DLM lockdumps. The cluster.py plugin already had code for GFS1, but it only capture 1 set of lockdumps. When debugging lockdumps for DLM, GFS1, or GFS2 capturing a single lockdump is usually not enough. You really need to capture multiple instances of DLM, GFS1, or GFS2 to see if what has changed in the lockdumps or threads. Capturing multiple lockdump iterations helps to diagnose lock contention.  A filesystem might appear hung when in fact it is just moving very slow which you would not be able to tell with a single instances of the lockdumps.
> 
> The plugin gfs2.py adds two options that were previously not in cluster.py. 
> * lockdump_count: number of iterations/runs of capturing lockdump data
> * lockdump_wait: wait time between capturing each lockdump
> 
> This gives sosreport the ability to capture lockdumps in a more useful manner. We have been using a script that we developed called gfs2_lockcapture, but I believe that adding the same ability to sosreport is a better fit and provide a standard way of capturing this data.
> - https://git.fedorahosted.org/cgit/gfs2-utils.git/tree/gfs2/scripts/gfs2_lockcapture

Is this shipped in any package that we'd expect to be installed on a
regular gfs2 installation?

Our approach to date when components have complex needs for data
collection (or require multi-step actions that need to be kept in-sync
with the requirements of the other component) has been to lean on
externally provided scripts and to use those to collect this data.

Could that be an option here?

> When enabling the gfs2 plugin it is best to only run a small set of plugins that will not interfere with the GFS2 filesystems. For example, capturing filesys data could cause problems with GFS2 in surprising ways. Here is the command that I have found that captures all the information needed when reviewing GFS2 lockdumps:

This is a bit tricky; ideally we want all options to be usable in any
combination. Special purpose options that require the user to know that
they shouldn't allow certain plugins to run can be confusing and lead to
false alarm bug reports.

If the plugin was enabled by default and collected a minimal amount of
general data (that's normally expected to be low-risk and to not have
any significant side-effects) then I think I'd be more comfortable
having this sort of functionality as an option.

I'd like to have sos be able to gather something that's useful for
general debugging on systems using gfs2 without any special options - is
this feasible?

I think it might be a good approach to get something basic that deals
with simple cases first and to then add more complex debugging
assistance like lockdumps at a later time - especially as this seems
like it may need some changes to core sos to make it practical (we're
not really set up for long-running data collection tasks right now).

Regards,
Bryn.




More information about the sos-devel mailing list