[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU

Thu Dec 17 22:47:43 UTC 2009

Chris Wright wrote:
>> I don't want to get bogged down in a qemu-devel discussion on
>> libvirt-devel :-)
>>     
>
> The reason I brought it up here is in case libvirt would be doing both.
> /dev/vhost takes an fd for a tap device or raw socket.  So libvirt would
> need to open both, and then becomes a question of whether libvirt only
> passes the single vhost fd (after setting it up completely) or passes
> both the vhost fd and connecting fd for qemu to put the two together.
> I didn't recall migration (if qemu would need tap fd again).
>   

I'm heavily leaning towards taking a /dev/vhost fd but we'll see what 
Michael posts.

>> But from a libvirt perspective, I assume that it wants to open up
>> /dev/vhost in order to not have to grant the qemu instance
>> privileges which means that it needs to hand qemu the file
>> descriptor to it.
>>
>> Given a file descriptor, I don't think qemu can easily tell whether
>> it's a tun/tap fd or whether it's a vhost fd.  Since they have
>> different interfaces, we need libvirt to tell us which one it is.
>> Whether that's -net tap,vhost or -net vhost, we can figure that part
>> out on qemu-devel :-)
>>     
>
> Yeah, I agree, just thinking of the workflow as it impacts libvirt.
>   

I really prefer -net vhost,fd=X  where X is the fd of an open /dev/vhost.

When invoking qemu directly, for the first go about, I'd expect -net 
vhost,dev=eth0 for a raw device and -net vhost,mode=tap,tap-arguments.

Long term, there are so many possible ways to layer things, that I'd 
really like to see:

-net vepa,dev=eth0

Which ends up invoking /usr/libexec/qemu-net-helper-vepa --arg-dev=eth0 
--socketpair=X --try-vhost.

qemu-net-helper-vepa would do all of the fancy stuff of creating a 
macvtap device, trying to hook that up with vhost, sending us an fd over 
the socketpair telling us which interface it's using and what features 
were enabled.

That lets people infinitely extend qemu's networking support while allow 
us to focus on just implementing backends for the interfaces we're 
exposed to.  AFAICT, that's just /dev/vhost, /dev/net/tun, and a normal 
socket.  The later two can be reduced to a single read/write interface 
honestly.

>> In general, I think it's best to avoid as much network configuration
>> in qemu as humanly possible so I'd rather see libvirt configure the
>> vhost device ahead of time and pass us an fd that we can start
>> using.
>>     
>
> Hard to disagree, but will make qemu not work w/out libvirt?
>   

No, net/ would essentially become a series of helper programs.  What's 
nice about this approach is that libvirt could potentially use helpers 
too which would allow people to run qemu directly based on the output of 
ps -ef.  Would certainly make debugging easier.

>> But we still need to support the notion of backing a VNIC to a NIC,
>> no?  If this just happens to also work with a naive usage of SR-IOV,
>> is that so bad? :-)
>>     
>
> Nope, not at all ;-)
>
> We do need to know if a VF is available or not (and if a PF has any of
> its VFs used).

"We need to know" or "it would be nice to know"?

You can make the same argument about a physical network interface.

>   Needed on migration ("can I hook up to a VF on target?"),
> and for assignment ("can I give this PCI device to a guest?  wait, it's
> a PF and VF's are in use." Although, I don't think libvirt actually goes
> beyond, "wait it's a PF").
>   

Migration's definitely tough because the ethX device might carry a 
different name on a different node.  I'm not sure how libvirt handles 
this today.  Is it possible to do a live migration with libvirt whereas 
the mount location of a common network file system changes?

For instance, if /mount/disk.img becomes /mnt/disk.img?

>> I think the notion of network pools as being somewhat opaque really
>> works well for this.  Ideally you would create a network pool based
>> on the requirements you had, and the management tool would figure
>> out what the best set of implementations to use was.
>>
>> VEPA is really a unique use-case in my mind.  It's when someone
>> wants to use an external switch for their network management.
>>     
>
> It's an enterprise thing, sure, but we need to be able to manage.
> Ditto for a VN-Tag approach.  They all require some basic setup.
>   

Clearly I want to punt network setup out of qemu because it's awfully 
complex.  It makes me wonder if the same should be true for libvirt?  To 
what extend is libvirt going to do network management over time?  Should 
I expect to be able to use libvirt to create arbitrarily complex network 
pools using custom iptable rules?

I think libvirt punting the setup of these things to something else 
isn't such a bad idea.

-- 
Regards,

Anthony Liguori