From bmr at redhat.com Fri Oct 25 12:56:22 2013 From: bmr at redhat.com (Bryn M. Reeves) Date: Fri, 25 Oct 2013 13:56:22 +0100 Subject: [sos-devel] [PATCH 0/3] Series:Adding Power support to sosreport In-Reply-To: <20131025063139.19600.47299.stgit@localhost.localdomain> References: <20131025063139.19600.47299.stgit@localhost.localdomain> Message-ID: <526A6A76.5070706@redhat.com> On 10/25/2013 07:32 AM, Bharani C.V wrote: > The following series adds Power Systems plugin to sosreport. > So we add a check in each policy -RedHat,Ubuntu and Debian > to see if sosreport is being run on power platform by > checking output of #uname -m > If the check is passed i.e., it is indeed power system, > then PowerPlugin class is list as valid subclass for that distribution. > We define PowerPlugin class and a new PowerPC Plugin has been added > which runs and collects generic Power logs . Based on further platform > checks,IBM Power System specific logs and commands will be collected. > This would help IBM Power system users to collect system data in one > shot by running sosreport. Thanks very much for submitting this. The PowerPC plugin itself looks good. I'm not sure that tagging classes are the best approach for controlling the plug-in activation however. There is already a get_arch() method defined in the policy classes that returns the collection host's architecture formatted as a string. This is used e.g. in the s390 plug-in's check_enabled() method to determine whether nor not the plug-in should run. The processor plug-in uses the same method to decide whether or not to run the x86info program. If we're going to adopt tagging classes for architectures then I think we should change this across the tree rather than leave several implementations lying around but I'm not convinced that they are a major improvement over the simple method call. What do you see as the advantage? Regards, Bryn. From bmr at redhat.com Wed Oct 30 13:58:49 2013 From: bmr at redhat.com (Bryn M. Reeves) Date: Wed, 30 Oct 2013 13:58:49 +0000 Subject: [sos-devel] [PATCH V2] Adding PowerPC Plugin In-Reply-To: <20131028152643.17064.49181.stgit@localhost.localdomain> References: <20131028152643.17064.49181.stgit@localhost.localdomain> Message-ID: <52711099.1070701@redhat.com> On 10/28/2013 03:29 PM, Bharani C.V wrote: > This patch defines a new PowerPC Plugin to collect generic Power logs. > Based on further platform checks,IBM Power System specific logs and > commands will be collected. > This would help IBM Power system users to collect system data in one > shot by running sosreport. > > Changes from V1: > Moved power architecture check from policy to > plugin class. Applied - thanks again - please test with the version now on the master branch of the sosreport repo on github: https://github.com/sosreport/sosreport/commits/master Regards, Bryn. From bmr at redhat.com Thu Oct 31 11:16:30 2013 From: bmr at redhat.com (Bryn M. Reeves) Date: Thu, 31 Oct 2013 11:16:30 +0000 Subject: [sos-devel] New plugin "gfs2.py" for collecting lockdumps for DLM, GFS2 Message-ID: <52723C0E.1050901@redhat.com> On Mon, 26 Aug 2013 15:44:38, Shane Bradley wrote: > This plugin "gfs2.py" will capture GFS2 and DLM lockdumps. The cluster.py plugin already had code for GFS1, but it only capture 1 set of lockdumps. When debugging lockdumps for DLM, GFS1, or GFS2 capturing a single lockdump is usually not enough. You really need to capture multiple instances of DLM, GFS1, or GFS2 to see if what has changed in the lockdumps or threads. Capturing multiple lockdump iterations helps to diagnose lock contention. A filesystem might appear hung when in fact it is just moving very slow which you would not be able to tell with a single instances of the lockdumps. > > The plugin gfs2.py adds two options that were previously not in cluster.py. > * lockdump_count: number of iterations/runs of capturing lockdump data > * lockdump_wait: wait time between capturing each lockdump > > This gives sosreport the ability to capture lockdumps in a more useful manner. We have been using a script that we developed called gfs2_lockcapture, but I believe that adding the same ability to sosreport is a better fit and provide a standard way of capturing this data. > - https://git.fedorahosted.org/cgit/gfs2-utils.git/tree/gfs2/scripts/gfs2_lockcapture Is this shipped in any package that we'd expect to be installed on a regular gfs2 installation? Our approach to date when components have complex needs for data collection (or require multi-step actions that need to be kept in-sync with the requirements of the other component) has been to lean on externally provided scripts and to use those to collect this data. Could that be an option here? > When enabling the gfs2 plugin it is best to only run a small set of plugins that will not interfere with the GFS2 filesystems. For example, capturing filesys data could cause problems with GFS2 in surprising ways. Here is the command that I have found that captures all the information needed when reviewing GFS2 lockdumps: This is a bit tricky; ideally we want all options to be usable in any combination. Special purpose options that require the user to know that they shouldn't allow certain plugins to run can be confusing and lead to false alarm bug reports. If the plugin was enabled by default and collected a minimal amount of general data (that's normally expected to be low-risk and to not have any significant side-effects) then I think I'd be more comfortable having this sort of functionality as an option. I'd like to have sos be able to gather something that's useful for general debugging on systems using gfs2 without any special options - is this feasible? I think it might be a good approach to get something basic that deals with simple cases first and to then add more complex debugging assistance like lockdumps at a later time - especially as this seems like it may need some changes to core sos to make it practical (we're not really set up for long-running data collection tasks right now). Regards, Bryn. From sbradley at redhat.com Thu Oct 31 16:07:02 2013 From: sbradley at redhat.com (Shane Bradley) Date: Thu, 31 Oct 2013 12:07:02 -0400 Subject: [sos-devel] New plugin "gfs2.py" for collecting lockdumps for DLM, GFS2 In-Reply-To: <52723C0E.1050901@redhat.com> References: <52723C0E.1050901@redhat.com> Message-ID: <5A26DF8C-30FC-4A4F-A0A9-23DACD7B1C1E@redhat.com> Thanks for the response. I will take a look at this again and maybe a better approach would be to just have end user run multiple iterations of sosreport: # sosreport ...; sleep 30; sosreport ....; sleep 30; sosreport... I do agree with what you said, currently the plugin feels a bit hacky in a way because of libs I need do not exist. I will revisit this patch and see what I can do based on your feedback. ---- Shane Bradley Senior Software Maintenance Engineer (Cluster HA, GFS, GFS2) Red Hat Global Support Services Red Hat, Inc. On Oct 31, 2013, at 7:16 AM, Bryn M. Reeves wrote: > On Mon, 26 Aug 2013 15:44:38, Shane Bradley wrote: >> This plugin "gfs2.py" will capture GFS2 and DLM lockdumps. The cluster.py plugin already had code for GFS1, but it only capture 1 set of lockdumps. When debugging lockdumps for DLM, GFS1, or GFS2 capturing a single lockdump is usually not enough. You really need to capture multiple instances of DLM, GFS1, or GFS2 to see if what has changed in the lockdumps or threads. Capturing multiple lockdump iterations helps to diagnose lock contention. A filesystem might appear hung when in fact it is just moving very slow which you would not be able to tell with a single instances of the lockdumps. >> >> The plugin gfs2.py adds two options that were previously not in cluster.py. >> * lockdump_count: number of iterations/runs of capturing lockdump data >> * lockdump_wait: wait time between capturing each lockdump >> >> This gives sosreport the ability to capture lockdumps in a more useful manner. We have been using a script that we developed called gfs2_lockcapture, but I believe that adding the same ability to sosreport is a better fit and provide a standard way of capturing this data. >> - https://git.fedorahosted.org/cgit/gfs2-utils.git/tree/gfs2/scripts/gfs2_lockcapture > > Is this shipped in any package that we'd expect to be installed on a > regular gfs2 installation? > > Our approach to date when components have complex needs for data > collection (or require multi-step actions that need to be kept in-sync > with the requirements of the other component) has been to lean on > externally provided scripts and to use those to collect this data. > > Could that be an option here? > >> When enabling the gfs2 plugin it is best to only run a small set of plugins that will not interfere with the GFS2 filesystems. For example, capturing filesys data could cause problems with GFS2 in surprising ways. Here is the command that I have found that captures all the information needed when reviewing GFS2 lockdumps: > > This is a bit tricky; ideally we want all options to be usable in any > combination. Special purpose options that require the user to know that > they shouldn't allow certain plugins to run can be confusing and lead to > false alarm bug reports. > > If the plugin was enabled by default and collected a minimal amount of > general data (that's normally expected to be low-risk and to not have > any significant side-effects) then I think I'd be more comfortable > having this sort of functionality as an option. > > I'd like to have sos be able to gather something that's useful for > general debugging on systems using gfs2 without any special options - is > this feasible? > > I think it might be a good approach to get something basic that deals > with simple cases first and to then add more complex debugging > assistance like lockdumps at a later time - especially as this seems > like it may need some changes to core sos to make it practical (we're > not really set up for long-running data collection tasks right now). > > Regards, > Bryn. > > _______________________________________________ > sos-devel mailing list > sos-devel at redhat.com > https://www.redhat.com/mailman/listinfo/sos-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmr at redhat.com Thu Oct 31 16:29:30 2013 From: bmr at redhat.com (Bryn M. Reeves) Date: Thu, 31 Oct 2013 16:29:30 +0000 Subject: [sos-devel] New plugin "gfs2.py" for collecting lockdumps for DLM, GFS2 In-Reply-To: <5A26DF8C-30FC-4A4F-A0A9-23DACD7B1C1E@redhat.com> References: <52723C0E.1050901@redhat.com> <5A26DF8C-30FC-4A4F-A0A9-23DACD7B1C1E@redhat.com> Message-ID: <5272856A.4070809@redhat.com> On 10/31/2013 04:07 PM, Shane Bradley wrote: > Thanks for the response. I will take a look at this again and maybe a > better approach would be to just have end user run multiple iterations > of sosreport: > # sosreport ...; sleep 30; sosreport ....; sleep 30; sosreport... This could be a good interim option. One of the things we've discussed around sos to make it more useful for this kind of targeted debugging is to allow "profiles" to exist that specify some set of plugins and options to use (e.g. a 'lockdump' profile in this case). This might be a first step in that direction. > I do agree with what you said, currently the plugin feels a bit hacky in > a way because of libs I need do not exist. I will revisit this patch and > see what I can do based on your feedback. Let me know if there are specific additions we can make that would help to facilitate this kind of feature. I'd really like to be able to support this but I'd like us to do it in a manner that's generally useful across multiple plugins rather than a one-off for each use case. Regards, Bryn.