From sbradley at redhat.com Mon Aug 26 19:44:38 2013 From: sbradley at redhat.com (Shane Bradley) Date: Mon, 26 Aug 2013 15:44:38 -0400 Subject: [sos-devel] New plugin "gfs2.py" for collecting lockdumps for DLM, GFS2 Message-ID: <0A3C6645-C277-4FA8-9E7E-FFD2996AE68E@redhat.com> This plugin "gfs2.py" will capture GFS2 and DLM lockdumps. The cluster.py plugin already had code for GFS1, but it only capture 1 set of lockdumps. When debugging lockdumps for DLM, GFS1, or GFS2 capturing a single lockdump is usually not enough. You really need to capture multiple instances of DLM, GFS1, or GFS2 to see if what has changed in the lockdumps or threads. Capturing multiple lockdump iterations helps to diagnose lock contention. A filesystem might appear hung when in fact it is just moving very slow which you would not be able to tell with a single instances of the lockdumps. The plugin gfs2.py adds two options that were previously not in cluster.py. * lockdump_count: number of iterations/runs of capturing lockdump data * lockdump_wait: wait time between capturing each lockdump This gives sosreport the ability to capture lockdumps in a more useful manner. We have been using a script that we developed called gfs2_lockcapture, but I believe that adding the same ability to sosreport is a better fit and provide a standard way of capturing this data. - https://git.fedorahosted.org/cgit/gfs2-utils.git/tree/gfs2/scripts/gfs2_lockcapture When enabling the gfs2 plugin it is best to only run a small set of plugins that will not interfere with the GFS2 filesystems. For example, capturing filesys data could cause problems with GFS2 in surprising ways. Here is the command that I have found that captures all the information needed when reviewing GFS2 lockdumps: # sosreport --batch --tmp-dir=/tmp -o general,kernel,process,startup,logs,cluster,gfs2 -k gfs2.gfs2lockdump=on -k gfs2.dlmlockdump=on -k gfs2.lockdump_count=3 -k gfs2.lockdump_wait=60; This command will capture all the GFS2/DLM lockdumps 3 times and will wait 60 seconds between each iteration. Each iteration will also collect process data: ps and /proc//{status, cmdline, stack} files which outputted to a parseable file. In addition, the gfs lockdump option should be removed from cluster.py since gfs2.py will capture the lockdumps. The DLM part should stay in cluster.py since it is useful for analyzing issues such as clvm. ---- Shane Bradley Senior Software Maintenance Engineer (Cluster HA, GFS, GFS2) Red Hat Global Support Services VC3 Raleigh, NC -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gfs2.py Type: text/x-python-script Size: 6153 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: