[sos-devel] [RFC PATCH] sosreport: Check for rpm database corruption during initialization

Bryn M. Reeves bmr at redhat.com
Tue Oct 14 11:49:51 UTC 2014


On Fri, Oct 10, 2014 at 11:28:14AM +0530, Aruna Balakrishnaiah wrote:
> sosreport runs an rpm query to get the package list. If rpmdb is corrupted
> sosreport hangs for ever hence check for rpmdb consistency before running
> the rpm query.

There's quite a bit going on here; let me see if I can set out all the
issues.

First of all blocking is not a definite sign of corruption; the rpm process
is attempting to acquire locks on the database which could simply indicate
that another process is accessing the database.

> yum check
> error: rpmdb: BDB0113 Thread/process 43828/70366497037824 failed: BDB1507 Thread died in Berkeley DB library
> error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
> error: cannot open Packages index using db5 -  (-30973)
> error: cannot open Packages database in /var/lib/rpm
> CRITICAL:yum.main:

Generally I don't like the idea of using another (higher-level) tool to
check for a problem in a lower-level subsystem. It's a bit of a layering
violation and I think there is probably a cleaner way to do it.
 
> Error: rpmdb open failed

We would need to be absolutely certain that this is both an accurate test
that the db is 'corrupt' and also that it cannot block in the same way;
otherwise it is no better than we have now.

Unfortunately I don't think the latter is true. You just need to take out
a read lock on the RPM DB in one process and the "yum check" will hang
forever:

  # rpm -Va
  .M.......    /
  ^Z
  [1]+  Stopped                 rpm -Va
  # yum check
  Loaded plugins: langpacks, product-id, subscription-manager
  [ ... hangs ... ]

The "yum check" operation also seems to be quite costly: it verifies the
RPM database for dependencies, duplicates, obsoletes, and provides. On
my little test VM this took 6m4s; that is not an acceptable delay if
this is going to take place for every sos run (most likely users would
assume it was stuck and kill it anyway).

It would be better to simply subject the initial rpm query to a timeout
and bail if it is exceeded.

> +               if 'rpmdb open failed' in info:
> +                   error("rpmdb corruption, rebuild rpmdb: rpmdb --rebuildb")

Aside from the other problems this is unwise; we aren't certain of
the state of the database here and suggesting --rebuilddb may not
be the correct course of action.

Regards,
Bryn.




More information about the sos-devel mailing list