Migration with "--p2p --tunnelled" hanging in v6.9.0

Daniel P. Berrangé berrange at redhat.com
Wed Nov 25 17:00:20 UTC 2020


On Wed, Nov 25, 2020 at 04:36:39PM +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 25, 2020 at 04:49:14PM +0100, Christian Ehrhardt wrote:
> > Thanks for the hint Daniel, it is indeed not migration specific - it
> > seems that virs-ssh-helper is just very slow.
> > 
> > rm testfile; virsh -c
> > qemu+ssh://testkvm-hirsute-to/system?proxy=netcat vol-download --pool
> > uvtool h-migr-test.qcow testfile & for i in $(seq 1 20); do sleep 1s;
> > ll -laFh testfile; done
> > [1] 42285
> > -rw-r--r-- 1 root root 24M Nov 25 15:20 testfile
> > -rw-r--r-- 1 root root 220M Nov 25 15:20 testfile
> > -rw-r--r-- 1 root root 396M Nov 25 15:20 testfile
> > -rw-r--r-- 1 root root 558M Nov 25 15:20 testfile
> > -rw-r--r-- 1 root root 756M Nov 25 15:20 testfile
> > -rw-r--r-- 1 root root 868M Nov 25 15:20 testfile
> > [1]+  Done                    virsh -c
> > qemu+ssh://testkvm-hirsute-to/system?proxy=netcat vol-download --pool
> > uvtool h-migr-test.qcow testfile
> > 
> > rm testfile; virsh -c
> > qemu+ssh://testkvm-hirsute-to/system?proxy=native vol-download --pool
> > uvtool h-migr-test.qcow testfile & for i in $(seq 1 20); do sleep 1s;
> > ll -laFh testfile; done
> > [1] 42307
> > -rw-r--r-- 1 root root 1.8M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 6.8M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 9.8M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 13M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 15M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 16M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 18M Nov 25 15:21 testfile
> > -rw-r--r-- 1 root root 19M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 21M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 22M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 23M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 25M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 26M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 27M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 28M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 29M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 30M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 31M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 32M Nov 25 15:22 testfile
> > -rw-r--r-- 1 root root 32M Nov 25 15:22 testfile
> > 
> > That is ~150-200 MB/s vs 1-5 MB/s and as seen it seems to start slow
> > AND degrades further.
> > I'm not at 90MB overall and down to ~150 KB/s
> > 
> > > and we'll probably wnt to colllect debug
> > > level logs from src+dst hosts.
> > 
> > I already had debug level logs of the migration [1] attached to the
> > launchpad bug I use to take my notes on this.
> > Taken with these configs:
> > log_filters="1:qemu 1:libvirt 3:object 3:json 3:event 1:util"
> > log_outputs="1:file:/var/log/libvirtd.log"
> > 
> > You can fetch the logs (of a migration), but I'm happy to generate you
> > logs of any other command (or with other log settings) as you'd prefer
> > them.
> > 
> > The network used in this case is a bridge between two containers, but
> > we can cut out more components.
> > I found that the same vol-download vs 127.0.0.1 gives the same results.
> > That in turn makes it easier to gather results as we only need one system.
> 
> Yep, that's useful, I'm able to reproduce this problem myself too
> now. Will do some local tests and report back...

Sigh, the problem is way too many reallocs, repeatedly growing and shrinking
the buffer we use for I/O.

I guess we never noticed this awfulness in the virsh console code it was
copied from, as the data volumes are lower.

Switching to a fixed size buffer makes it massively faster. I'll prep a
patch asap.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list