[libvirt-users] Zombie processes being created when console buffer is full

Thu Mar 10 14:36:22 UTC 2016

On 03/10/2016 12:34 AM, Martin Kletzander wrote:
> I'm not able to reproduce your issue, but that might be because I'm not
> running systemd neither in the container nor in the host.  If we can
> reproduce it without systemd, though, that would be very helpful for
> finding out the cause of all this.

We're seeing this on CentOS 7.1, which is systemd based. We were able to 
determine that the cause of the problem is due to a container's console 
buffer being filled. In a container (or VM) the console of course is not 
a real physical device, it's a pseudo tty. With a physical console, when 
some process writes something to /dev/console, it appears on the 
physical console and if no one is there to see the text it eventually 
scrolls off the screen and is lost. There is no limit to how much text 
can be sent to the console.

In the case of a container and its pseudo console, there is a buffer 
associated with the console device and this buffer has a size limit. If 
there is an active console session open for a container, any text sent 
to the container's console (e.g. by systemd) is consumed and processed 
by the container. However, if there is no active console session, as 
processes continue to write to the container's console device, the 
buffer associated with this pseudo console fills up. When this happens, 
any process that attempts to write to the container's console blocks and 
will stay blocked forever until a console session is started. These hung 
processes were the source of our zombie processes.

We solved the problem by writing a console monitor service that runs on 
the hypervisor hosting the containers. It continually monitors the 
console devices of all containers and if there is an open console 
session for a given container, it does nothing. If however there is no 
active console session, it opens the console device for the container 
and drains it using the following Python code:

             fd = os.open(console, os.O_RDWR | os.O_NOCTTY)
             termios.tcflush(fd, termios.TCIFLUSH)
             os.close(fd);

For expediency, we do not save the text that's read. This is ultimately 
similar to text scrolling off the top of a physical console.

So, although this monitor service has solved our issue with zombie 
processes, I'm not convinced this is really the right solution. I'd like 
to think if a container is setup correctly, its console device should 
not fill up and block processes that attempt to write to it. I would 
think this would be a big problem for anyone running containers under 
libvirt_lxc. The problem is easy to reproduce in our environment: Open a 
console session to container and run "cat" with no arguments. Leave it 
running and disconnect the console session (control-]). Determine the 
container's console device from its xml definition, e.g. /dev/pts/3, and 
then copy some large file to it, e.g.

          # cp /var/log/messages /dev/pts/3

Assuming the file is larger than the console's backing buffer, this cp 
command should hang. If you then open a console session to this 
container from another window, you'll see the contents of 
/var/log/messages appear on the screen and the cp command in the other 
window will exit.

If you are unable to reproduce it in your our setup following this 
procedure, then something is either wrong with my container 
configuration or there is something more insidious going on. I'd 
appreciate if you could run a test with this procedure and let me know 
the results.

Thanks.

Peter