[libvirt-users] Host loses network connectivity when starting containers

Peter Steele pwsteele at gmail.com
Sat Jun 4 04:56:47 UTC 2016


I have hit a problem running libvirt based containers in a CentOS 7 
based host, with the extra wrinkle that my host is an EC2 instance in 
AWS. Ultimately everything works as advertised, and I can launch 
instances that host multiple libvirt lxc containers without problems, 
with one caveat: About one time in ten when the containers are started, 
the instance loses all network connectivity. This can even happen on 
instances that host only a single container, so it's not related to the 
number of containers that are being run. Once this happens, the only way 
to fix the problem is to reboot the instance and hope that on the 
reboot, the containers will start successfully this time.

In troubleshooting efforts that I've done I've found that the problem 
does not occur with containers defined with the linuxcontainers.org 
flavor of lxc. I've also discovered that if I configure my containers to 
use libvirt's default isolated bridge device virbr0, this loss of 
network connectivity does not appear to happen when the containers are 
started. Specifically, I ran a test that repeatedly started/stopped a 
single container configured with virbr0 and the test ran for a long time 
without an issue. When I switched the container to use my fully 
configured bridge device br0, the start/stop test usually hung up within 
a few iterations. Meaning that my ssh session into the instance would 
hang and I'd be unable to reconnect to the instance. Unfortunately AWS 
does not provide an console back door into an instance so 
troubleshooting has been difficult. I've checked the system logs though 
after one of these hangs occurs and there are no errors reported 
anywhere that I can see.

I do not see this problem when running on real hardware. I also do not 
see the problem when running under other virtual environments, including 
VMware, KVM, and VirtualBox. My guess is that it is a bug in libvirt 
(since containers defined with the linuxcontainers.org lxc framework do 
not cause this issue.) There seems to be an AWS component to the problem 
as well though since I've only seen this happen in EC2 based instances.

Is anyone familiar with this problem? Does anyone have any suggestions 
how I might resolve it? Note that I have tried a recent version of 
libvirt (1.3.2-1) and it behaves the same as the stock CentOS version of 
libvirt (1.2.17-13).

Peter




More information about the libvirt-users mailing list