[Libguestfs] LVM flakey failures

Richard W.M. Jones rjones at redhat.com
Mon Apr 5 18:31:30 UTC 2021


On Mon, Apr 05, 2021 at 05:49:17PM +0300, Sam Eiderman wrote:
> Yes, attached below (it's quite long)

Thanks - just looking at the lines where the /dev/rhel/swap
device appears or its various synonyms:

> 2021-03-07 10:58:46 T libguestfs - 0 - appliance - drwxr-xr-x 2 0 0 60 Mar  7 10:58 rhel
[...]
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - /dev/mapper:
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - total 0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - crw------- 1 0 0 10, 236 Mar
>  7 10:58 control
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - lrwxrwxrwx 1 0 0       7 Mar
>  7 10:58 rhel-root -> ../dm-2
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - lrwxrwxrwx 1 0 0       7 Mar
>  7 10:58 rhel-swap -> ../dm-1

This is fairly early on during appliance boot where the init script is
listing out /dev/ recursively
(https://github.com/libguestfs/libguestfs/blob/b18ac489db76a700f2168ae6eb64e9d450613a27/appliance/init#L159).
This is also after we did udevadm settle in the init script.  It's
interesting that at this point /dev/mapper/rhel-swap exists correctly,
but /dev/rhel/ is an empty directory.

That's very much not expected ...

A bit later in the init script we list out PVs/VGs/LVs and everything
looks normal:

> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm pvs
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   PV         VG      Fmt
>  Attr PSize   PFree
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   /dev/sda2  rhel    lvm2
> a--  <15.00g    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   /dev/sdb   vg_myvg lvm2
> a--  496.00m    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm vgs
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   VG      #PV #LV #SN Attr  
> VSize   VFree
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   rhel      1   2   0 wz--n-
> <15.00g    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   vg_myvg   1   1   0 wz--n-
> 496.00m    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm lvs
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   LV     VG      Attr      
> LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   root   rhel    -wi-a-----
>  13.39g
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   swap   rhel    -wi-a-----
>   1.60g
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   lv_var vg_myvg -wi-a-----
> 496.00m

A little bit further on while we're executing the guestfs_inspect_os()
API call, we have this snippet:

> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm 'lvs' '-o'
> 'vg_name,lv_name' '-S' 'lv_role=public && lv_skip_activation!=yes'
> '--noheadings' '--separator' '/'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   rhel/root
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   rhel/swap
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   vg_myvg/lv_var
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/rhel/root'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - xfs
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/rhel/swap'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 2
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/vg_myvg/lv_var'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - xfs

If you look closely it's clear that /dev/rhel/root has been created,
/dev/vg_myvg/lv_var has been created (that was also an empty directory
earlier on), but /dev/rhel/swap has *not* been created--notice how the
blkid command exits with code 2.

And obviously by the time we get down here ...

> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   rhel/root
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   rhel/swap
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   vg_myvg/lv_var
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: error: stat: /dev/
> rhel/swap: No such file or directory

... it still doesn't exist.

We ran a udevadm --debug settle command before this, so if it wasn't
going to be created because of a delayed udev event then I'd expect it
should have done.

I'm not much closer to understanding why the /dev/VG/LV nodes are
created so much later/delayed than the /dev/mapper/VG-LV nodes.  As
far as I can tell they should be created by udev
(/usr/lib/udev/rules.d/11-dm-lvm.rules in Fedora).

Would suggest enabling udevadm debugging (uncomment #--debug in
appliance/init).

It would also be good to have some more concrete reproducer: you might
have appliance/init kill PID 1 if /dev/rhel/swap does not exist, then
repeatedly run g.launch() to see if you can make it fail, all the time
while trying to capture the log of the failure with udev debugging
turned on.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/




More information about the Libguestfs mailing list