[libvirt] [PATCH 0/8] Hostdev-hybrid patches

Daniel P. Berrange berrange at redhat.com
Thu Sep 13 12:01:31 UTC 2012


On Wed, Sep 12, 2012 at 03:01:08PM -0400, Laine Stump wrote:
> On 09/12/2012 05:59 AM, Daniel P. Berrange wrote:
> >  then I have to wonder why we need to
> > add all this code for a new "hybrid" device type. It seems to me like
> > we can do all this already simply by listing one virtio device and one
> > hostdev device in the guest XML.
> 
> Aside from detaching/re-attaching the hostdev, the other thing that
> these patches bring is automatic derivation of the <source> of the
> virtio-net device from the hostdev. The hostdev device will be grabbed
> from a pool of VFs in a <network>, then a "reverse lookup" is done in
> PCI space to determine the PF for that VF - that's where the virtio-net
> device is connected.
> 
> I suppose this could be handled by 1) putting only the VFs of a single
> PF in any network definition's device pool, and 2) always having two
> parallel network definitions like this:
> 
>     <network>
>       <name>net-x-vfs-hostdev</name>
>       <forward mode='hostdev' ephemeral='yes'>
>         <pf dev='eth3'/> <!-- makes a list of all VFs for PF 'eth3' -->
>       </forward>
>     </network>
> 
>     <network>
>       <name>net-x-pf-macvtap</name>
>       <forward mode='bridge'>
>         <interface dev='eth3'/>
>       </forward>
>     </network>

Eww, that's a bit of a nasty duplication.

> Then each guest would have:
> 
>    <interface type='network'>
>      <mac address='x:x:x:x:x:x'/>
>      <network name='net-x-vfs-hostdev'>
>    </interface>
>    <interface type='network'>
>      <mac address='x:x:x:x:x:x'/>
>      <network name='net-x-pf-macvtap'>
>      <model type='virtio'/>
>    </interface>
> 
> The problem with this is that then you can't have a pool that uses more
> than a single PF-worth of VFs. For example, I have an Intel 82576 card
> that has 2 PFs and 7 VFs per PF. This would mean that I can only have 7
> VFs in a network. Let's say I have 10 guests and want to migrate them
> back and forth between two hosts, I would have to make some arbitrary
> decision that some would use "net-x-vfs-hostdev+net-x-pf-macvtap" and
> some others would use "net-y-vfs-hostdev+net-y-pf-macvtap". Even worse
> would be if I had > 14 guests - there would be artificial limits (beyond
> simply "no more than 14 guests/host") on which guests could be moved to
> which machine at any given time (I would have to oversubscribe the
> 7-guest limit for one pair of networks, and no more than 7 of that
> subset of guests could be on the same host at the same time).
> 
> If, instead, the PF used for the virtio-net device is derived from the
> particular VF currently assigned to the same guest's hostdev, I can have
> a single network definition with VFs from multiple PFs, and they all
> become one big pool of resources. In that case, my only limit is the far
> simpler "no more than 14 guests/host"; no worries about *which* of the
> guests those 14 are. tl;dr - the two-in-one hostdev-hybrid device
> simplifies administrative decisions when you have/need multiple PFs.
> 
> (another minor annoyance is that the dual device allows both to use the
> same auto-generated MAC address, but if we just use two individual
> devices, the MAC must be manually specified for each when the device is
> originally defined (so that they will match)).

Why not just define a new element to put inside the <interface> tag
to indicated two related devices. <paired/> and lookup the pairing
based on the MAC address.

Alternatively, you could define a new source type for the associated
device, eg  <interface type='paired'> and again asociate based on
the MAC address.

> >  All that's required is to add support
> > for the 'ephemeral' against hostdevs, so they are automagically
> > unplugged. Technically we don't even need that, since a mgmt app can
> > already just use regular hotunplug APIs before issuing the migrate
> > API calls.
> 
> I like the idea of having that capability at libvirt's level, so that
> you can easily try things out with virsh (is the ephemeral flag
> implemented so that it also works for virsh save/restore? That would be
> a double plus.) A lot of us don't really use anything higher level than
> virsh or virt-manager, especially for testing.
> 
> (I actually think there's merit to adding the ephemeral flag (can anyone
> think of a better name? When I hear ephemeral, I think of that TV chef -
> Emeril) for hostdevs in general - it would provide a method of easily
> allowing save/restore/migration for guests that have hostdevs that could
> be temporarily detached without ill consequences. I think proper
> operation would require that qemu notify libvirt when it's *really*
> finished detaching a device though (I don't have it at hand right now,
> but there's an open BZ requesting that from qemu).)

True, we can't safely do migration until QEMU has truely removed the
PCI device from the guest, and must prevent migration if that doesn't
happen. This is something that must be addressed regardless.

As for the name, we already use 'ephemeral' and 'transient' in
libvirt - either one of those would be reasonable choices.

> >   These patches seem to add alot of complexity for mere
> > syntactic sugar over existing capabilities.
> 
> I agree that the two-in-one device adds a lot of complexity. If we could
> find a way to derive the PF used for the virtio-net device from the VF
> used for the hostdev without having a combined two-in-one device entry
> (and being able to use a common auto-generated mac address would be nice
> too), then I would agree that it should be left as two separate device
> entries (if nothing else, this gives us an obvious place to put the PCI
> address of the 2nd device). I'm not sure how to do that without limiting
> pools to a single PF though. (I know, I know - the solution is for a
> higher level management application to modify the guest's config during
> migration according to what's in use. But if we're going to do that
> anyway, we may as well not have network definitions defining pools of
> interfaces in the first place.)



Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list