issues with vm after upgrade

daggs daggs at gmx.com
Fri Aug 20 16:07:54 UTC 2021


Greetings Laine,

> Sent: Monday, August 16, 2021 at 12:57 AM
> From: "Laine Stump" <laine at redhat.com>
> To: "daggs" <daggs at gmx.com>
> Cc: "Martin Kletzander" <mkletzan at redhat.com>, libvirt-users at redhat.com
> Subject: Re: issues with vm after upgrade
>
>
>
> On 8/14/21 6:05 AM, daggs wrote:
> > Greetings Martin,
> >
> >> Sent: Thursday, August 12, 2021 at 2:07 PM
> >> From: "daggs" <daggs at gmx.com>
> >> To: "Martin Kletzander" <mkletzan at redhat.com>
> >> Cc: dan at berrange.com, libvirt-users at redhat.com
> >> Subject: Re: issues with vm after upgrade
> >>
> >>> Sent: Thursday, August 12, 2021 at 11:49 AM
> >>> From: "Martin Kletzander" <mkletzan at redhat.com>
> >>> To: "daggs" <daggs at gmx.com>
> >>> Cc: dan at berrange.com, libvirt-users at redhat.com
> >>> Subject: Re: issues with vm after upgrade
> >>>
> >>> On Wed, Aug 11, 2021 at 08:53:10PM +0200, daggs wrote:
> >>>> Greetings Martin,
> >>>>
> >>>>
> >>>>> Sent: Wednesday, August 11, 2021 at 6:08 PM
> >>>>> From: "daggs" <daggs at gmx.com>
> >>>>> To: "Martin Kletzander" <mkletzan at redhat.com>
> >>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
> >>>>> Subject: Re: issues with vm after upgrade
> >>>>>
> >>>>> Greetings Martin,
> >>>>>
> >>>>>> Sent: Wednesday, August 11, 2021 at 4:13 PM
> >>>>>> From: "Martin Kletzander" <mkletzan at redhat.com>
> >>>>>> To: "daggs" <daggs at gmx.com>
> >>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
> >>>>>> Subject: Re: issues with vm after upgrade
> >>>>>>
> >>>>>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
> >>>>>>> Greetings Martin,
> >>>>>>>
> >>>>>>>> Sent: Wednesday, August 11, 2021 at 10:14 AM
> >>>>>>>> From: "Martin Kletzander" <mkletzan at redhat.com>
> >>>>>>>> To: "daggs" <daggs at gmx.com>
> >>>>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
> >>>>>>>> Subject: Re: issues with vm after upgrade
> >>>>>>>>
> >>>
> >>> [...]
> >>>
> >>>>>>>>
> >>>>>>>> 2) To your issue with starting the domain it would be good to know what
> >>>>>>>>      is the error you get from virsh (or however you are starting the
> >>>>>>>>      domain) and the debug logs of libvirtd, ideally just for the part of
> >>>>>>>>      the domain starting.
> >>>>>>> that is the issue, there wasn't any error. the vm just didn't booted.
> >>>>>>
> >>>>>> Oh, so I misunderstood.  What was the state of the VM in libvirt?
> >>>>>> "paused" or "running"?  Was there serial console working?
> >>>>> it was marked as running and there was no serial
> >>>>>
> >>>
> >>> That's a pity we could not examine what was actually happening.
> >>>
> >>>>>>
> >>>>>>> I can diff the original xml with the new one to see the diffs and post them here if you wish
> >>>>>>>
> >>>>>>
> >>>>>> Would be nice to see if there are any differences.  The newly created
> >>>>>> one works then?
> >>>>>>
> >>>>>
> >>>>> I'll sent it later today
> >>>>>
> >>>>
> >>>> here: https://dpaste.com/5VBUU8Z9W
> >>>>
> >>>
> >>> Unfortunately there are many differences there.  The machine type
> >>> changes _something_ in qemu, there is different PCI(e) topology, and I
> >>> do not think I will be able to figure this out without the non-working
> >>> machine.
> >>>
> >>> So if your current setup works for you right now I'd leave figuring out
> >>> the previous issue to others, if there is anyone wanting to figure out
> >>> if there is some libvirt issue.
> >>>
> >>> Have a nice day
> >>>
> >>
> >> my current setup works beside the hdmi audio, this I still need to investigate.
> >>
> >> thanks for your help.
> >>
> >> Dagg
> >>
> >
> > just to update, I've solved the sound issue, frankly, I don't understand how the guest showed a soundcard in the first place.
> > from what I gather, libvirt sets the -nodefaults flag to prepare the vm's properties from scratch.
> > in this situation, the sound card is a function in the host machine's pci tree.
> > when libvirt created the pci tree for the guest, it placed the card as a function of a device as well, in my case 02:00.2
> > however it didn't created a device at 02:00.0.
>
> Are you basing this claim on the libvirt XML? Or on what you see with
> lspci in the guest?
>
> When libvirt is assigning PCI addresses to devices in a guest, it will
> never auto-assign a non-0 function. This will only happen if the user
> explicitly requests it (and even then, iirc, libvirt should generate an
> error if function 0 of the same slot has no device - something to the
> effect of "no device on function 0 of a multifunction device").
>
> Anyway, when I looked back at the XML diff you posted earlier (see
> below), I didn't see any hostdev device assigned to 02:00.2. What I
> *did* see was that in both the old and the new version of the diff, the
> hostdev devices were assigned to function 0 of different *slots* on a
> dmi-to-pci-bridge controller, which should cause no problems (unless
> there is a bug in QEMU's dmi-to-pci-bridge). (The important thing,
> though, is that there is no hostdev device on a non-0 function, and when
> it is on a non-0 slot, that's because it's on a dmi-to-pci-bridge (which
> has 32 slots).
I saw it in guest, I'd assume that if libvirt defines a device on a specific bdf, the guest will not change it.
infact, over the last 10 years I've booted thousand of systems both bare metal and visualized and never encountered such scenario.
that said, it might be a bug in qemu.

what I did saw is that on the old vm in guest, after the upgrade the sound card was defined as a function of the scsi virtblk controller and the new vm placed
it as a function of non existent device.

>
>
> On the topic of having a dmi-to-pci-bridge show up in your XML: I don't
> remember what versions the changes were in (it was at least a year or
> two ago), but only a fairly old version of libvirt woud do that - 1)
> recent libvirt will assume that any hostdev PCI device is a PCIe device,
> so it will add a pcie-root-port and assign the hostdev device to slot 0
> of that root-port, and even before that 2) we switched from using
> dmi-to-pci-bridge to using pcie-to-pci-bridge quite some time ago as well.
as stated in the original mail, the issue started after a major version upgrade of both libvirt and qemu,
I'm currently using latest stable afaik.

>
> So if you're generating new XML based on config that doesn't have pci
> controllers already in it, and you're seeing hostdevs (or any other PCI
> devices) assigned to an automatically-added dmi-to-pci-bridge, then your
> libvirt version is severely out of date.
here are the version I'm using:
# emerge --search app-emulation/libvirt app-emulation/qemu

[ Results for search key : app-emulation/libvirt ]
Searching...

*  app-emulation/libvirt
      Latest version available: 7.5.0
      Latest version installed: 7.5.0
      Size of files: 9749 KiB
      Homepage:      https://www.libvirt.org/ https://gitlab.com/libvirt/libvirt/
      Description:   C toolkit to manipulate virtual machines
      License:       LGPL-2.1

[ Applications found : 1 ]


[ Results for search key : app-emulation/qemu ]
Searching...

*  app-emulation/qemu
      Latest version available: 6.0.0-r52
      Latest version installed: 6.0.0-r52
      Size of files: 22724 KiB
      Homepage:      http://www.qemu.org http://www.linux-kvm.org
      Description:   QEMU + Kernel-based Virtual Machine userland tools
      License:       GPL-2 LGPL-2 BSD-2

[ Applications found : 1 ]

>
>
> On 8/11/21 2:53 PM, daggs wrote:
>  >> From: "daggs" <daggs at gmx.com>
>  >>> From: "Martin Kletzander" <mkletzan at redhat.com>
>  >>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>  >>>> I can diff the original xml with the new one to see the diffs and
> post them here if you wish
>  >>>>
>  >>>
>  >>> Would be nice to see if there are any differences.  The newly created
>  >>> one works then?
>  >>
>  >> I'll sent it later today
>  >>
>  >
>  > here: https://dpaste.com/5VBUU8Z9W
>
>
> > my fix was to move the device to 00:1f.4 in the guest.
>
> That's an interesting choice :-). You could have just put it on function
> 0 of some other unused slot (or a non-0 function of the slot the GPU is
> assigned to). 00:1f is used for integrated devices on the Q35 chipset -
> it's nice that QEMU's emulation code was written to allowing adding more
> devices on that slot, but I wouldn't have been surprised if it had
> caused problems...
10 years of working in a virtualization company has taught me that somethings, keeping the pci structure close as much as possible
to the original is the best way to go.
that is why I chose it a s func, it is a func on the host mahcine.

>
>
> > I won't be surprised this was the issue why the vm didn't booted after the upgrade with the old xml.
>
> Well, if your XML had a device assigned to a non-0 function of a slot
> and no device in function 0 of that slot, it would have failed to work
> previously as well (my recollection is that in this case it's more a
> problem of the guest OS not probing non-0 functions when there is
> nothing on function 0, and not with anything done by QEMU).
>
>
here is the xml of the machine after I've recreated it, it worked but no sound: https://dpaste.com/BB9EDY6BK
I used virt-manager. note that the sound card pt is placed as a func in bus 0x8 which doesn't exists.

Dagg.





More information about the libvirt-users mailing list