issues with vm after upgrade

Laine Stump laine at redhat.com
Sun Aug 15 21:57:55 UTC 2021



On 8/14/21 6:05 AM, daggs wrote:
> Greetings Martin,
> 
>> Sent: Thursday, August 12, 2021 at 2:07 PM
>> From: "daggs" <daggs at gmx.com>
>> To: "Martin Kletzander" <mkletzan at redhat.com>
>> Cc: dan at berrange.com, libvirt-users at redhat.com
>> Subject: Re: issues with vm after upgrade
>>
>>> Sent: Thursday, August 12, 2021 at 11:49 AM
>>> From: "Martin Kletzander" <mkletzan at redhat.com>
>>> To: "daggs" <daggs at gmx.com>
>>> Cc: dan at berrange.com, libvirt-users at redhat.com
>>> Subject: Re: issues with vm after upgrade
>>>
>>> On Wed, Aug 11, 2021 at 08:53:10PM +0200, daggs wrote:
>>>> Greetings Martin,
>>>>
>>>>
>>>>> Sent: Wednesday, August 11, 2021 at 6:08 PM
>>>>> From: "daggs" <daggs at gmx.com>
>>>>> To: "Martin Kletzander" <mkletzan at redhat.com>
>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
>>>>> Subject: Re: issues with vm after upgrade
>>>>>
>>>>> Greetings Martin,
>>>>>
>>>>>> Sent: Wednesday, August 11, 2021 at 4:13 PM
>>>>>> From: "Martin Kletzander" <mkletzan at redhat.com>
>>>>>> To: "daggs" <daggs at gmx.com>
>>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>
>>>>>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
>>>>>>> Greetings Martin,
>>>>>>>
>>>>>>>> Sent: Wednesday, August 11, 2021 at 10:14 AM
>>>>>>>> From: "Martin Kletzander" <mkletzan at redhat.com>
>>>>>>>> To: "daggs" <daggs at gmx.com>
>>>>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
>>>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>>>
>>>
>>> [...]
>>>
>>>>>>>>
>>>>>>>> 2) To your issue with starting the domain it would be good to know what
>>>>>>>>      is the error you get from virsh (or however you are starting the
>>>>>>>>      domain) and the debug logs of libvirtd, ideally just for the part of
>>>>>>>>      the domain starting.
>>>>>>> that is the issue, there wasn't any error. the vm just didn't booted.
>>>>>>
>>>>>> Oh, so I misunderstood.  What was the state of the VM in libvirt?
>>>>>> "paused" or "running"?  Was there serial console working?
>>>>> it was marked as running and there was no serial
>>>>>
>>>
>>> That's a pity we could not examine what was actually happening.
>>>
>>>>>>
>>>>>>> I can diff the original xml with the new one to see the diffs and post them here if you wish
>>>>>>>
>>>>>>
>>>>>> Would be nice to see if there are any differences.  The newly created
>>>>>> one works then?
>>>>>>
>>>>>
>>>>> I'll sent it later today
>>>>>
>>>>
>>>> here: https://dpaste.com/5VBUU8Z9W
>>>>
>>>
>>> Unfortunately there are many differences there.  The machine type
>>> changes _something_ in qemu, there is different PCI(e) topology, and I
>>> do not think I will be able to figure this out without the non-working
>>> machine.
>>>
>>> So if your current setup works for you right now I'd leave figuring out
>>> the previous issue to others, if there is anyone wanting to figure out
>>> if there is some libvirt issue.
>>>
>>> Have a nice day
>>>
>>
>> my current setup works beside the hdmi audio, this I still need to investigate.
>>
>> thanks for your help.
>>
>> Dagg
>>
> 
> just to update, I've solved the sound issue, frankly, I don't understand how the guest showed a soundcard in the first place.
> from what I gather, libvirt sets the -nodefaults flag to prepare the vm's properties from scratch.
> in this situation, the sound card is a function in the host machine's pci tree.
> when libvirt created the pci tree for the guest, it placed the card as a function of a device as well, in my case 02:00.2
> however it didn't created a device at 02:00.0.

Are you basing this claim on the libvirt XML? Or on what you see with 
lspci in the guest?

When libvirt is assigning PCI addresses to devices in a guest, it will 
never auto-assign a non-0 function. This will only happen if the user 
explicitly requests it (and even then, iirc, libvirt should generate an 
error if function 0 of the same slot has no device - something to the 
effect of "no device on function 0 of a multifunction device").

Anyway, when I looked back at the XML diff you posted earlier (see 
below), I didn't see any hostdev device assigned to 02:00.2. What I 
*did* see was that in both the old and the new version of the diff, the 
hostdev devices were assigned to function 0 of different *slots* on a 
dmi-to-pci-bridge controller, which should cause no problems (unless 
there is a bug in QEMU's dmi-to-pci-bridge). (The important thing, 
though, is that there is no hostdev device on a non-0 function, and when 
it is on a non-0 slot, that's because it's on a dmi-to-pci-bridge (which 
has 32 slots).


On the topic of having a dmi-to-pci-bridge show up in your XML: I don't 
remember what versions the changes were in (it was at least a year or 
two ago), but only a fairly old version of libvirt woud do that - 1) 
recent libvirt will assume that any hostdev PCI device is a PCIe device, 
so it will add a pcie-root-port and assign the hostdev device to slot 0 
of that root-port, and even before that 2) we switched from using 
dmi-to-pci-bridge to using pcie-to-pci-bridge quite some time ago as well.

So if you're generating new XML based on config that doesn't have pci 
controllers already in it, and you're seeing hostdevs (or any other PCI 
devices) assigned to an automatically-added dmi-to-pci-bridge, then your 
libvirt version is severely out of date.


On 8/11/21 2:53 PM, daggs wrote:
 >> From: "daggs" <daggs at gmx.com>
 >>> From: "Martin Kletzander" <mkletzan at redhat.com>
 >>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs wrote:
 >>>> I can diff the original xml with the new one to see the diffs and 
post them here if you wish
 >>>>
 >>>
 >>> Would be nice to see if there are any differences.  The newly created
 >>> one works then?
 >>
 >> I'll sent it later today
 >>
 >
 > here: https://dpaste.com/5VBUU8Z9W


> my fix was to move the device to 00:1f.4 in the guest.

That's an interesting choice :-). You could have just put it on function 
0 of some other unused slot (or a non-0 function of the slot the GPU is 
assigned to). 00:1f is used for integrated devices on the Q35 chipset - 
it's nice that QEMU's emulation code was written to allowing adding more 
devices on that slot, but I wouldn't have been surprised if it had 
caused problems...


> I won't be surprised this was the issue why the vm didn't booted after the upgrade with the old xml.

Well, if your XML had a device assigned to a non-0 function of a slot 
and no device in function 0 of that slot, it would have failed to work 
previously as well (my recollection is that in this case it's more a 
problem of the guest OS not probing non-0 functions when there is 
nothing on function 0, and not with anything done by QEMU).




More information about the libvirt-users mailing list