[PATCH] Fixed missing VM vport when batch start or migration partially failed
Laine Stump
laine at redhat.com
Tue Jun 16 23:49:24 UTC 2020
To complete the circle, here is my response to a *different* patch
trying to fix this same problem. I did a bit more investigating during
my reply, so there is better / more complete information:
https://www.redhat.com/archives/libvir-list/2020-June/msg00681.html
On 6/15/20 11:10 PM, Wei Gong wrote:
> environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062
> centos7 openvswitch-2.3.1
> vm network xml :
> <interface type='bridge'>
> <mac address='52:54:00:46:45:95'/>
> <source bridge='ovsbr-mgt'/>
> <vlan>
> <tag id='0'/>
> </vlan>
> <virtualport type='openvswitch'>
> <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
> </virtualport>
> <target dev='vnet0'/>
> <model type='virtio'/>
> <link state='up'/>
> <alias name='net0'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
> </interface>
>
> qemuProcessStart in qemu_process.c failed to start.
> The first is qemu process stop(At this time, the kernel will recycle
> tap device,
> and the tap device is applied by other virtual machines).Then, ovs
> removevport.
> It is possible to processing concurrently qemuProcessStart and
> qemuProcessStop.
> qemuProcessStop(ovs removevport) may remove ports of other virtual
> machines
> while using openvswitch virtualport.
>
> for example:
> Failure to start the vm1, the tap device vnet0 will be recovered
> first(at this time vm2 starts and
> uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0(
> remove vnet0
> belonging to vm2 at this time ). During this time interval,
> vm2 will apply for the same tap device vnet0 and add port vnet0.
> At this time, removing the port from vm1 will cause the port of vm2
> to be lost.
> vm2 will not be able to access the network through this vnet0.
>
> reproduce:
> Batch start or migrate 10 virtual machines to the same node, one of
> the virtual machines start failed.
> This failure may be that the storage cannot connect or other
> failures(when we reproduced internally,
> one of the virtual machines was connected to an invalid storage, and
> it was artificially failed).
>
> this problem will cause:
> After batch migration, the network of a virtual machine cannot be
> accessed,
> and the virtual machine service is interrupted
>
> libvirt handles ovs logs:
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4
> "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface
> vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\""
> -- set Interface vnet4
> "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4
> "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface
> vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\""
> -- set Interface vnet4
> "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
>
>
> Thanks
>
> Laine Stump <laine at redhat.com <mailto:laine at redhat.com>>
> 于2020年6月16日周二 上午10:01写道:
>
> On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
> >
> >
> > On 6/12/20 3:18 AM, gongwei at smartx.com
> <mailto:gongwei at smartx.com> wrote:
> >> From: gongwei <gongwei at smartx.com <mailto:gongwei at smartx.com>>
> >>
> >> start to failed will not remove the openvswitch port,
> >> the port recycling in this case lets openvswitch handle it by
> itself
> >>
> >> Signed-off-by: gongwei <gongwei at smartx.com
> <mailto:gongwei at smartx.com>>
> >> ---
> >
> > Can you please elaborate on the commit message? By the commit
> title and
> > the code, I'm assuming that you're saying that we shouldn't
> remove the
> > openvswitch port if the QEMU process failed to start, for any other
> > reason aside from SHUTOFF_FAILED.
>
>
> More importantly, what "port recycling" will take effect dependent on
> how the qemu process is stopped (which I would think wouldn't make
> any
> different to OVS), and why is it necessary for libvirt to not do it.
>
>
> Up until now, what I have known is that ports will not be removed
> from
> an OVS switch unless they are explicitly removed with ovs-vsctl, and
> this attachment will persist across reboots of the host system. As a
> matter of fact I've had cases during development where libvirt didn't
> remove the OVS port for a tap device when a guest was terminated, and
> then many *days* (and several reboots) later the same tap device name
> was used for a different guest that was using a Linux host bridge,
> and
> the tap device failed to attach to the Linux host bridge because
> it had
> already been auto-attached back to the OVS switch as soon as it
> was created.
>
>
> Can you desccribe how to reproduce the situation where libvirt
> removes
> the OVS port when it shouldn't, and what is the bad outcome of that
> happening?
>
>
>
> >
> > The code itself looks ok.
> >
> >
> >
> >> src/qemu/qemu_process.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> >> index d36088ba98..439bd5b396 100644
> >> --- a/src/qemu/qemu_process.c
> >> +++ b/src/qemu/qemu_process.c
> >> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> >> if (vport) {
> >> if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
> >> ignore_value(virNetDevMidonetUnbindPort(vport));
> >> - } else if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
> >> + } else if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
> >> + reason != VIR_DOMAIN_SHUTOFF_FAILED) {
> >> ignore_value(virNetDevOpenvswitchRemovePort(
> >> virDomainNetGetActualBridgeName(net),
> >> net->ifname));
> >>
> >
>
>
>
> --
>
> 龚伟
>
>
> 手机:18883262137
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20200616/0eb74a59/attachment-0001.htm>
More information about the libvir-list
mailing list