[PATCH] Fixed missing VM vport when batch start or migration partially failed

Laine Stump laine at redhat.com
Tue Jun 16 23:49:24 UTC 2020


To complete the circle, here is my response to a *different* patch 
trying to fix this same problem. I did a bit more investigating during 
my reply, so there is better / more complete information:

https://www.redhat.com/archives/libvir-list/2020-June/msg00681.html

On 6/15/20 11:10 PM, Wei Gong wrote:
>   environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 
> centos7 openvswitch-2.3.1
>  vm network xml :
> <interface type='bridge'>
>   <mac address='52:54:00:46:45:95'/>
>   <source bridge='ovsbr-mgt'/>
>   <vlan>
>     <tag id='0'/>
>   </vlan>
>   <virtualport type='openvswitch'>
>     <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
>   </virtualport>
>   <target dev='vnet0'/>
>   <model type='virtio'/>
>   <link state='up'/>
>   <alias name='net0'/>
>   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
> function='0x0'/>
> </interface>
>
> qemuProcessStart in qemu_process.c failed to start.
> The first is qemu process stop(At this time, the kernel will recycle 
> tap device,
> and the tap device is applied by other virtual machines).Then, ovs 
> removevport.
> It is possible to processing concurrently qemuProcessStart and 
> qemuProcessStop.
> qemuProcessStop(ovs removevport) may remove ports of other virtual 
> machines
> while using openvswitch virtualport.
>
> for example:
> Failure to start the vm1, the tap device vnet0 will be recovered 
> first(at this time vm2 starts and
> uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0( 
> remove vnet0
> belonging to vm2 at this time ). During this time interval,
> vm2 will apply for the same tap device vnet0 and add port vnet0.
>  At this time, removing the port from vm1 will cause the port of vm2 
> to be lost.
> vm2 will not be able to access the network through this vnet0.
>
> reproduce:
> Batch start or migrate 10 virtual machines to the same node, one of 
> the virtual machines start failed.
> This failure may be that the storage cannot connect or other 
> failures(when we reproduced internally,
>  one of the virtual machines was connected to an invalid storage, and 
> it was artificially failed).
>
> this problem will cause:
> After batch migration, the network of a virtual machine cannot be 
> accessed,
> and the virtual machine service is interrupted
>
> libvirt handles ovs logs:
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
>
>
> Thanks
>
> Laine Stump <laine at redhat.com <mailto:laine at redhat.com>> 
> 于2020年6月16日周二 上午10:01写道:
>
>     On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>     >
>     >
>     > On 6/12/20 3:18 AM, gongwei at smartx.com
>     <mailto:gongwei at smartx.com> wrote:
>     >> From: gongwei <gongwei at smartx.com <mailto:gongwei at smartx.com>>
>     >>
>     >> start to failed will not remove the openvswitch port,
>     >> the port recycling in this case lets openvswitch handle it by
>     itself
>     >>
>     >> Signed-off-by: gongwei <gongwei at smartx.com
>     <mailto:gongwei at smartx.com>>
>     >> ---
>     >
>     > Can you please elaborate on the commit message? By the commit
>     title and
>     > the code, I'm assuming that you're saying that we shouldn't
>     remove the
>     > openvswitch port if the QEMU process failed to start, for any other
>     > reason aside from SHUTOFF_FAILED.
>
>
>     More importantly, what "port recycling" will take effect dependent on
>     how the qemu process is stopped (which I would think wouldn't make
>     any
>     different to OVS), and why is it necessary for libvirt to not do it.
>
>
>     Up until now, what I have known is that ports will not be removed
>     from
>     an OVS switch unless they are explicitly removed with ovs-vsctl, and
>     this attachment will persist across reboots of the host system. As a
>     matter of fact I've had cases during development where libvirt didn't
>     remove the OVS port for a tap device when a guest was terminated, and
>     then many *days* (and several reboots) later the same tap device name
>     was used for a different guest that was using a Linux host bridge,
>     and
>     the tap device failed to attach to the Linux host bridge because
>     it had
>     already been auto-attached back to the OVS switch as soon as it
>     was created.
>
>
>     Can you desccribe how to reproduce the situation where libvirt
>     removes
>     the OVS port when it shouldn't, and what is the bad outcome of that
>     happening?
>
>
>
>     >
>     > The code itself looks ok.
>     >
>     >
>     >
>     >>   src/qemu/qemu_process.c | 3 ++-
>     >>   1 file changed, 2 insertions(+), 1 deletion(-)
>     >>
>     >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>     >> index d36088ba98..439bd5b396 100644
>     >> --- a/src/qemu/qemu_process.c
>     >> +++ b/src/qemu/qemu_process.c
>     >> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>     >>           if (vport) {
>     >>               if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>     >> ignore_value(virNetDevMidonetUnbindPort(vport));
>     >> -            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>     >> +            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>     >> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>     >> ignore_value(virNetDevOpenvswitchRemovePort(
>     >> virDomainNetGetActualBridgeName(net),
>     >>                                    net->ifname));
>     >>
>     >
>
>
>
> -- 
>
> 龚伟
>
>
> 手机:18883262137
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20200616/0eb74a59/attachment-0001.htm>


More information about the libvir-list mailing list