Migration with "--p2p --tunnelled" hanging in v6.9.0

Christian Ehrhardt christian.ehrhardt at canonical.com
Wed Nov 25 12:28:09 UTC 2020


On Wed, Nov 25, 2020 at 10:55 AM Christian Ehrhardt
<christian.ehrhardt at canonical.com> wrote:
>
> On Tue, Nov 24, 2020 at 4:30 PM Peter Krempa <pkrempa at redhat.com> wrote:
> >
> > On Tue, Nov 24, 2020 at 16:05:53 +0100, Christian Ehrhardt wrote:
> > > Hi,
> >
> > [...]
>
> BTW to reduce the scope what to think about - I have rebuilt 6.8 as
> well it works.
> Thereby I can confirm that the offending change should be in between
> 6.8.0 -> 6.9.0.

I was able to get this working in git bisect builds from git between
v6.8 / v6.9.
I identified the following offending commit:
  7d959c30  rpc: Fix virt-ssh-helper detection

Ok that makes a bit of sense, first we had in 6.8
  f8ec7c84 rpc: use new virt-ssh-helper binary for remote tunnelling
That makes it related to tunneling which matches our broken use-case.

The identified commit "7d959c30 rpc: Fix virt-ssh-helper detection" might
finally really enable the new helper and that is then broken?

With that knowledge I was able to confirm that it really is the native mode

$ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test
qemu+ssh://testkvm-hirsute-to/system?proxy=netcat
<works>
$ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test
qemu+ssh://testkvm-hirsute-to/system?proxy=native
<hangs>

I recently discussed with Andrea if we'd need apparmor rules for
virt-ssh-helper,
but there are no denials nor libvirt log entries related to virt-ssh-helper.
But we don't need such rules since it is spawned on the ssh login and
not under libvirtd itself.

PS output of the hanging receiving virt-ssh-helper (looks not too unhappy):
Source:
4     0   41305       1  20   0 1627796 23360 poll_s Ssl ?
0:05 /usr/sbin/libvirtd
0     0   41523   41305  20   0   9272  4984 poll_s S    ?
0:02  \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'virt-ssh-helper
'qemu:///system''
Target
4     0     213       1  20   0  13276  4132 poll_s Ss   ?
0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups
4     0   35148     213  20   0  19048 11320 poll_s Ss   ?
0:02  \_ sshd: root at notty
4     0   35206   35148  20   0   2584   544 do_wai Ss   ?
0:00      \_ sh -c virt-ssh-helper qemu:///system
0     0   35207   35206  20   0  81348 26684 -      R    ?
0:34          \_ virt-ssh-helper qemu:///system

I've looked at it with strace [1] and gdb for backtraces [2] - it is
not dead or stuck and keeps working.
Could it be just so slow that it appears to hang until it times out?
Or is the event mechanism having issues and it wakes up too rarely?

Also did anyone else see the same with >=v6.9.0?

[1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1904584/comments/12
[2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1904584/comments/13


> > > In git/news I only found these changes which sounded to be relevant:
> > >   f51cbe92c0 qemu: Allow migration over UNIX socket
> > >   c69915ccaf peer2peer migration: allow connecting to local sockets
> > > But I'm not using unix: and in the logs the only unix: mentions are for the
> > > qemu monitor and qemu-guest-agent.
> >
> > One very important part of '--tunnelled' migration is the use of
> > virStream APIs to transport the migration data. Perhaps something there
> > is broken since it doesn't reproduce when not using the tunnel
>
> Thanks for the hint Peter.
> I was now looking there as well, but other than the switch to  g_new0
> there is neither a change carrying the words "stream" nor one that
> is affecting the related files that implement virStream*.
>
> --
> Christian Ehrhardt
> Staff Engineer, Ubuntu Server
> Canonical Ltd



-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd




More information about the libvir-list mailing list