[libvirt] [PATCH] Allow a per-PCI passthrough device permissive attribute

Wed Jan 27 18:49:43 UTC 2010

On Wed, Jan 27, 2010 at 09:23:52AM -0500, Chris Lalancette wrote:
> Currently there is a global tag to let the administrator
> turn off system-wide ACS checking when doing PCI device
> passthrough.  However, this is too coarse-grained of an
> attribute, since it doesn't allow setups where certain
> guests are trusted while other ones are untrusted.  Allow
> more complicated setups by making the device checking
> a per-device setting.
> 
> The more detailed explanation of why this is necessary
> delves deep into PCIe internals.  Ideally we'd like
> to be able to probe devices and figure out whether it
> is safe to assign them.  In practice, this isn't possible
> because PCIe allows devices to have "hidden" bridges
> that software can't discover.  If you were to have
> two devices assigned to two different domains behind
> one of these hidden bridges, they could do P2P traffic
> and bypass all of the VT-d/IOMMU checks.
> 
> The next thing we could try to do is to have a whitelist
> of devices that we know to be safe.  For instance, instead
> of a "hidden" bridge, PCI devices can multiplex functions
> instead, which causes all traffic to head to an upstream
> bridge before P2P can take place.  Additionally, some
> "hidden" PCI bridges may have ACS on-board.  In both of
> these cases it's safe to passthrough the device(s), since
> they can't P2P without the IOMMU getting involved.
> 
> However, even if we did have a whitelist, I think we still
> need a permissive attribute.  For one thing, the whitelist
> will always be out of date with respect to new hardware,
> so we'd need to allow administrators to temporarily
> override the whitelist restriction until a new version of
> the whitelist came out.  Also, we want to support the case
> where the administrator knows it is safe to assign possibly
> unsafe devices to a domain he trusts.

A domain is only trusted until its guest OS gets exploited at which point
this proposed change may let it escape into the host. If you don't have
any IOMMU on your host, you can't use PCI device assignment with KVM at
all, because it would not be safe in the event of guest exploit / mis-
behavior. The same is true of device assignment in this non-ACS + hidden
bridge case.

Thus I don't see why we should introduce a special "permissive" flag 
solely for the non-ACS edge case, while at the smae time not allowing 
the same permissiveness for the far more common non-IOMMU case. NB, I'm 
not suggesting we allow skipping of the checks for the non-IOMMU case 
either.

Keeping a whitelist of devices up2date wrt new hardware launches is no
more troublesome than the existing problem of updating the PCI-IDs databse
or actually providing updated kernel releases with new drivers.

If we use the permissive attribute, then every admin with a device that
is known to be safe has the pain of setting the permissive attribute,
every time, on every machine with this hardware. If we have a whitelist,
then 99% of the time everything will just work because it will already
be known to the whitelist. If the whitelist were an external datafile
the admin could even extend it in the rare occasion when a new device 
were not known. This is a choice of make everyone solve over & over
again themselves, or solve it once for everybody.

I don't really like the idea of a whitelist, but I like it more than just
pushing the problem onto admins via per guest flags. For that matter I
don't like the host level flag we have either and would rather we removed
it. If only there's a 3rd way that were neither flags or whitelists ...

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|