[libvirt] [PATCH 0/6] Use more PCIe, less legacy PCI slots

Laine Stump laine at laine.org
Wed Aug 10 16:10:40 UTC 2016

On 08/10/2016 07:13 AM, Andrea Bolognani wrote:
> On Wed, 2016-08-10 at 00:32 -0400, Laine Stump wrote:
>> On 08/09/2016 09:11 AM, Andrea Bolognani wrote:
>>> On Mon, 2016-08-08 at 04:56 -0400, Laine Stump wrote:
>>>> These patches use three methods to get more of the PCI devices onto PCIe
>>>> slots on Q35 and aarch64/virt machinetypes:
>>>> 1) When virtio devices can present themselves as PCIe if they're
>>>>        plugged into a PCIe controller, do that. (This capability is
>>>>        detected by looking for presence of the "disable-modern" option on
>>>>        the virtio-net device. If you have a better idea for how to detect
>>>>        it, please let me know.)
>>> How does this play along with Ján's series[1] implementing a
>>> way to control the protocol revision for virtio devices?
>> It doesn't, because those patches haven't been pushed yet. But it
>> doesn't conflict either (except that Jan and I are both checking for
>> essentially the same epoch difference in virtio by looking at different
>> options - he looks for "disable-legacy" in any virtio device, while I
>> look for "disable-modern" in virtio-net).
>> Note that setting disable-modern/disable-legacy does *not* necessarily
>> do what you think, and we don't want to always have to set virtio
>> revision='1.0' in order to get a PCIe device, nor do we want all PCIe
>> devices to be revision 1.0-only. There is a case for having
>> disable-modern=off,disable-legacy=off even for a device installed on a
>> PCIe slot (it's compatible with old guest drivers, but new guest drivers
>> can take advantage of virtio 1.0, and it is legacy-PCI-free. On the
>> other hand, it requires the PCIe controller to reserve IO port space,
>> which we try to avoid.)
> Sheesh, you're right, I got it backwards - it's not that you
> need 1.0-only for PCIe, you *can't* have PCIe unless you also
> enable 1.0. Thanks for pointing out the mistake.

No, I think all the double negatives are confusing you :-) The only way 
to *not* get PCIe on a pci-*-port connected device is if you specify 

with disable-modern=off,disable-legacy=off, you'll get:
pcie-root: pci/0.9+1.0
pcie-*-port: pcie/0.9+1.0
pci-bridge: pci/0.9+1.0

with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>):
pcie-root: pci/0.9
pcie-*-port: pci/0.9    <-- we want to avoid this one
pci-bridge: pci/0.9

with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>):
pcie-root: pci/1.0
pcie-*-port: pcie/1.0
pci-bridge: pci/1.0

These are the implications that I can see of setting 0.9-only or 1.0-only:

0.9-only - PCI ID 1af4:1000, the device is always PCI, never PCIe, uses 
1.0-only - PCI ID 1af4:1041, the device is PCIe when on a pcie-*-port, 
otherwise PCI, doesn't use ioport
0.9+1.0 - PCI ID 1af4:1000, same rules for PCIe vs. PCI as 1.0-only, 
uses ioport

(Note that in qemu 2.6.0, the default was 0.9+1.0 everywhere. in qemu 
2.7.0, the default changes depending on what the device is plugged into 
- in particular, when plugged into a pcie-*-port, the default is 1.0-only.)

>> For example, I just tried every combination of:
>> disable-modern=off,disable-legacy=off (i.e. "all versions")
>> disable-modern=on,disable-legacy=off ("0.9")
>> disable-modern=off,disable-legacy=on ("1.0")
>> with these three types of slots:
>>       pcie-root  (PCIe root complex "integrated" device)
>>       pcie-root-port (pcie)
>>       pci-bridge  (legacy PCI)
>> I found that setting "1.0" doesn't force the device to be PCIe - if it's
>> on either pci-bridge or pcie-root, it will still be properly advertised
>> as a PCI device (the "Express Device" capability has been removed from
>> the PCI capabilities data). The difference is that it will no longer
>> request/require IO port space, and the device ID changes from 1af4:1000
>> to 1af4:1041. So there is no problem with putting a device with
>> revision='1.0' into a legacy PCI slot.
> The pcie-root bit is weird - I know legacy PCI devices can be
> assigned directly to it, but I'd expect a device that can do
> both PCI and PCIe to present itself as PCIe when plugged into
> a PCIe-capable slot.

pcie-root *is* a bit weird. Alex pointed out to me yesterday that on our 
Lenovo laptops, we have an ethernet device at 00:19.0 that shows itself 
as a PCI device, but is using the e1000e driver (e1000e is a PCIe 
device). On the other hand, the same system has an audio device at 
00:1b.0 that shows up as a PCIe device (you can see this by running 
"lspci -v" as root and looking for a "Capabilities" line that says 
"Express ....".)

> Does this mean we have to be careful *not* to assign virtio
> devices to pcie-root unless they're 0.9-only, in order to
> chase our goal of having PCI-free guests?

No. That's just the way pcie-root works. I think the important thing is 
to avoid plugging legacy-only devices into pcie-*-port, and to avoid 
dmi-to-pci-bridge and pci-bridge if at all possible. (And actually I 
don't see any upside to making virtio devices 0.9-only for any reason 
except to protect against a bug in virtio-1.0 - virtio devices will 
always properly "downgrade" themselves to legacy PCI when their 
connection warrants it).

>> Setting 0.9 *does* force the device to be a PCI device even if it's
>> plugged into a PCIe slot - the "Express is removed from the PCI
>> capabilities. So definitely if 0.9 is set in order to disable
>> virtio-1.0, we should not put it in a PCIe slot. This leads to the
>> conclusion that any guest OS on Q35 that doesn't have virtio-1.0 drivers
>> will require one of the following:
>> 1) require that we can force both disables to OFF (that's the only way
>> to get a true PCIe device that supports virtio-0.9)
> That sounds like a good idea, and Ján's series already covers
> that use case IIRC: you'd just have to add
>    <virtio revision='0.9'/>
>    <virtio revision='1.0'/>
> to the device. That gives you full control.

Right. I misunderstood that when I went through his patches before. I 
had assumed it could be specified only once (and was thus confused about 
why he stored it as a bitmap rather than a single enum value).

>> (NB: starting with
>> qemu-2.7.0, the default for virtio devices plugged into a pcie-*-port
>> slot will be "disable-legacy=on" (in order to conserve IO port space),
>> and the only way for us to set disable-legacy=off will be to force
>> virtio-0.9, i.e. there will be *no method* of getting a PCIe device that
>> supports old virtio.)
> Even though disable-legacy will default to on, I assume we
> would still be able to set it to off explicitly? Or would
> that be prevented altogether?

Yes, qemu allows it to be explicitly set to off. It's just the default 
that is changed.

>> 2) live with having a dmi-to-pci-bridge and pci-bridge so that we'll
>> have a proper place to plug in an 0.9-only device.
>> 3) provide a simple way to force the 0.9-only virtio device onto
>> pcie-root (pcie.0) (but of course that doesn't support hotplug...)
>> Anyway, back to the topic - we *don't* want to be using the virtio
>> revision setting to force PCI vs. PCIe (or think that we have to set it
>> in order to get one or the other); the only information we can take from
>> that setting to aid us in the PCI/PCIe decision is that when 0.9 is set,
>> we definitely can't put the device on a PCIe slot; otherwise, it can be
>> put on either type of slot and it will work correctly.
> Indeed.
>>> At the very least, the current approaches don't mix - if both
>>> your series and Ján's were to be merged now, a device that has
>>> been configured to use virtio 0.9 would end up assigned to a
>>> PCIe slot.
>> The only reason they don't appear to mix is because Jan's patches don't
>> account for mine, and my patches don't account for his, because none of
>> them are pushed yet. There's no intrinsic reason why they couldn't
>> though.
> Just to be clear, I didn't mean to suggest that you were doing
> anything wrong: I just noticed both you and Ján were working on
> partially overlapping features that don't necessarily work well
> together (see example above). Discussing how to make them work
> well together was the whole point of me bringing that up :)
>>>> 2) Any devices that aren't hotpluggable anyway will no longer request
>>>>        a hotplug-capable slot. This, along with a change to auto-assign
>>>>        legacy PCI devices to pcie-root (as long as they don't require
>>>>        hotplug) means the devices will now be assigned to pcie-root.
>>>> 3) Also using the fact that devices that won't be hotplugged can be
>>>>        assigned to pcie-root, the devices that *do* support hotplug have a
>>>>        new optional subelement "<hotplug require='no'/>" (it defaults to
>>>>        "yes" for historical reasons). If hotplug require is set to 'no',
>>>>        even a PCI device can be auto-assigned to pcie-root.
>>>> I haven't yet removed the dmi-to-pci-bridge that is added by default,
>>>> and we still will only auto-add pci-bridge (so if you're adding a
>>>> virtio device without <hostplug require='no'/> then you will also need
>>>> to add a <controller type='pci' model='pcie-root-port'/> to plug it
>>>> into).
>>> I don't get any dmi-to-pci-bridge on aarch64 virt guests,
>>> which is good because we don't want it at the moment :)
>> Yes, because you've already removed it, in the way that I want to remove
>> it from Q35. (but because Q35 guests are more likely to need a PCI-only
>> device than aarch64 guests, I can't really do that until I've updated
>> the "auto-create more slots" code to create a dmi-to-pci-bridge when
>> necessary.
>> (Hmm, but here's a problem - if there aren't currently any legacy PCI
>> devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and
>> someone decides to hotplug a legacy-PCI device, *then* what do you do? I
>> don't want to have these legacy controllers in *every damn config in the
>> world* just in case some moro... er "well meaning user" thinks they want
>> to hotplug an rtl_8139 emulated ethernet...)
> I don't really have an answer for that, I'm afraid.
> What I can say is: there are many areas in libvirt where we
> choose the setup that ensures the widest compatibility by
> default, so that users who don't want to care too much about
> the details can just stick to that and have very high chances
> of never running into trouble.
> I think that's a sensible course of action, as long as you
> also allow users who know what they're doing and want to
> build a very precise setup to do so.
> With that in mind, auto-adding a dmi-to-pci-bridge/pci-bridge
> combo by default is probably not too bad *assuming* you have
> a way to opt out of it.

Right now the only way to opt out is to put some other PCI controller at 
index 1 (e.g. a pci-root-port). For some other default devices (e.g. usb 
controllers and memory balloon) we take care of this with 
"model='none'", e.g.:

      <controller type='usb' model='none'/>

I've actually never really liked that, and it doesn't directly translate 
to this use case anyway - it's not that we don't want *any* PCI 
controllers, it's that we don't want a controller of a particular model.

Maybe we just need to think about the whole "reserve a spot for a 
potential future hotplug device" in a different way for PCIe - on 
systems with pci-root, we never really made a conscious decision that we 
had to have open slots available for hotplug - it just automatically 
happened that way because for legacy PCI, *all* slots (including on the 
root bus) support hotplug, and each PCI bus starts out with 31 open 
slots, so you're almost 100% assured that any given config would have an 
empty slot available for potential hotplug. PCIe doesn't work that way 
though - a basic machine has exactly 0 slots that support hotplug and 
(except for adding legacy-PCI bridges) any PCI controller that you add 
has only a single slot, so the chance of just coincidentally having a 
slot open for hotplug is exactly 0%. So are we just taking something 
that happened by chance with pci-based machines and turning it into a 
requirement for pcie? Instead, maybe we could just tell people if they 
might want hotplug, to add a few empty pci-*-port devices. (after all, 
if we were going to add them automatically, how many should we add? 1? 
32? No matter what you pick, it's too many for some, and not enough for 

More information about the libvir-list mailing list