[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libguestfs] LVM flakey failures



On Mon, Apr 05, 2021 at 02:47:51PM +0300, Sam Eiderman wrote:
> Hi,
> 
> We are experiencing very rare LVM failures - 2 failures so far, in
> different OSs, in different libguestfs functions.
> 
> The first failure is inspect_os() not finding the root operating
> system on rhel7.4.
> LVM volumes are returned by lvm command but files under /dev do not exist (yet?)
> 
> Second failure is in is_lv() - is_lv() successfully enumerates all lvm
> volumes but then internal stat() command fails again on /dev file
> since it does not exist (yet?) (rhel8.0)
> 
> All of our tests run in parallel, 1 guestfs instance per core on a 32
> core machine and they run on GCP (nested virtualization).
> 
> What we think that is happening here is that libguestfs' appliance is
> booting very somewhat slower than usual and that the links to some
> devices do not appear yet (even after multiple seconds).
> We found this old issue that might be connected to this behavior (in
> some way): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616689

I wonder if "udevadm settle" is not working?  The daemon will use this
command at various times in order to ensure that all preceeding udev
messages have been processed and all /dev changes have been made.

It is called once at appliance boot:

  https://github.com/libguestfs/libguestfs/blob/b18ac489db76a700f2168ae6eb64e9d450613a27/appliance/init#L109

And throughout the daemon code:

  https://github.com/libguestfs/libguestfs/blob/b18ac489db76a700f2168ae6eb64e9d450613a27/daemon/utils.c#L732

  $ git grep 'udev_settle ()' -- daemon
  daemon/blockdev.c:  udev_settle ();
  daemon/cryptsetup.ml:  udev_settle ()
  daemon/cryptsetup.ml:  udev_settle ()
  daemon/file.c:    udev_settle ();
  daemon/guestfsd.c:  udev_settle ();
  daemon/hotplug.c:    udev_settle ();
  [etc etc]

It could be that udev_settle is not being called at the right points,
or is not working in the way we understand.

...
> Short second failure logs (is_lv() only) - notice that is_lv() is
> invoked on /dev/vg_myvg/lv_var but it fails due to a problem in
> /dev/rhel/swap not existing)
> 
> 2021-03-07 10:58:53 T libguestfs - 0 - enter - is_lv
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: =>
> aug_get (0x13) took 0.00 secs
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: <= is_lv
> (0x108) request length 64 bytes
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - commandrvf:
> stdout=n stderr=y flags=0x0
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - commandrvf: udevadm
> --debug settle -E /dev/vg_myvg/lv_var
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm 'lvs'
> '-o' 'vg_name,lv_name' '-S' 'lv_role=public &&
> lv_skip_activation!=yes' '--noheadings' '--separator' '/'
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm returned 0
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm: stdout:
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - rhel/root
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - rhel/swap
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - vg_myvg/lv_var
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: error:
> stat: /dev/rhel/swap: No such file or directory

You might want to look earlier in this log to see if udevadm settle
was called between the LVs being activated and this API function.  If
it was not being called then possibly we need to insert a call after
activation.  If it was being called then perhaps udev settle is not
working the way we understand it.

Collecting the full logs is the right approach to diagnosing this.

The only other issue I can think of is the change in kernel PCI device
enumeration code (starting in Linux 5.6,
https://bugzilla.redhat.com/show_bug.cgi?id=1804207).  I suppose in
theory the underlying devices might not be ready at all before we run
udev settle in the appliance.  However I have not seen this actually
happen.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]