[libvirt] high outage times for qemu virtio network links during live migration, trying to debug
Paolo Bonzini
pbonzini at redhat.com
Tue Jan 26 17:31:56 UTC 2016
On 26/01/2016 18:21, Chris Friesen wrote:
>>>
>>> My question is, why doesn't qemu continue processing virtio packets
>>> while the dirty page scanning and memory transfer over the network is
>>> proceeding?
>>
>> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd
>> have no delay---only dropped packets. Or am I missing something?
>
> I have separate timestamps embedded in the packet for when it was sent
> and when it was echoed back by the target (which is the one being
> migrated). What I'm seeing is that packets to the guest are being sent
> every msec, but they get delayed somewhere for over a second on the way
> to the destination VM while the migration is in progress. Once the
> migration is over, a bunch of packets get delivered to the app in the
> guest and are then processed all at once and echoed back to the sender
> in a big burst (and a bunch of packets are dropped, presumably due to a
> buffer overflowing somewhere).
That doesn't exclude a bug somewhere in net/ code. It doesn't pinpoint
it to QEMU or vhost-net.
In any case, what I would do is to use tracing at all levels (guest
kernel, QEMU, host kernel) for packet rx and tx, and find out at which
layer the hiccup appears.
Paolo
> For comparison, we have a DPDK-based fastpath NIC type that we added
> (sort of like vhost-net), and it continues to process packets while the
> dirty page scanning is going on. Only the actual cutover affects it.
More information about the libvir-list
mailing list