[libvirt-users] Zombie processes being created when console buffer is full

Martin Kletzander mkletzan at redhat.com
Thu Mar 10 08:34:17 UTC 2016


On Fri, Jan 29, 2016 at 10:25:08AM -0800, Peter Steele wrote:
>On 01/29/2016 05:08 AM, Peter Steele wrote:
>> We have been researching stuck zombie processes in our libvirt lxc
>> containers.  What we found was:
>>
>> 1) Each zombie’s parent was pid 1.  init which symlinks to systemd.
>> 2) In some cases, the zombies were launched by systemd, in others the
>> zombie was inherited.
>> 3) While the child is in the zombie state, the parent process
>> (systemd) /proc/1/status shows no pending signals.
>> 4) Attaching gdb to systemd, there was 1 thread and it was waiting in
>> write() and the file being written was /dev/console.
>>
>> This write() to the console never returns.  We operated under the
>> assumption that systemd's SIGCHLD handler sets a bit and a foreground
>> thread (the only thread) would see that child processes needed
>> reaping.   While the single thread is stuck in write(), the reaping
>> never takes place.
>>
>> So why is write() blocking?  The answer seems to be that there is
>> nothing draining the console and eventually it blocks write() when its
>> buffers become full.  When we attached to the container's console, the
>> buffer is cleared allowing systemd’s write() to return. The zombies
>> are then reaped and everything goes back to normal.
>>
>> Our “solution” was more of a workaround.  systemd was altered to log
>> errors/warnings/etc to /dev/null instead of /dev/console. This
>> prevented the problem, only in that the console buffer was unlikely to
>> get filled up since systemd generally is the only then that writes to
>> it. This is definitely a hack though.
>>
>> This may be a bug in the libvirt container library (you can't expect
>> something to periodically connect to a container's console to empty it
>> out). We suspect there may also be a configuration issue in our
>> containers with regards to the console.
>>
>> Has anyone else observed this problem?
>>

Unfortunately I did not.  How would I go about reproducing it?

>As I mentioned here, I think this may have to do with incorrect
>container configuration with regards to the console. Much of the process
>though is automated by libvirt itself so I'm not sure what I might be
>missing. When a container is created, the xml config has this entry defined:
>
>     <console type='pty'>
>       <target type='lxc' port='0'/>
>     </console>
>
>After starting the container, the console config in the xml changes, e.g.:
>
>     <console type='pty' tty='/dev/pts/2'>
>       <source path='/dev/pts/2'/>
>       <target type='lxc' port='0'/>
>       <alias name='console0'/>
>     </console>
>

This all looks fine to me.

>In addition to these changes, a new entry is created under /dev/pts:
>
># ll /dev/pts
>crw--w---- 1 root tty  136, 0 Jan 29 08:27 0
>crw--w---- 1 root tty  136, 1 Jan 29 08:26 1
>crw--w---- 1 root tty  136, 2 Jan 29 09:19 2  <---
>crw--w---- 1 root tty  136, 6 Jan 29 09:22 6
>c--------- 1 root root   5, 2 Jan 29 07:52 ptmx
>
>The libvirt_lxc process that is spawned a link is created for /dev/console:
>
># ll /dev/conole
>lrwxrwxrwx 1 root root      10 Jan 29 09:53 console -> /dev/pts/0
>
>and /dev/pts/0 is also created:
>
># ll /dev/pts/0
>crw--w---- 1 root tty 136, 0 Jan 29 10:05 /dev/pts/0
>
>I'm surprised that the major/minor number for this isn't the same as
>/dev/pts/2 in the host. I'm also surprised that no agetty process is
>launched for the container. I'd expect to see something like this
>running in the container:
>

The minor number should not be the same, I believe.  That's because of
namespaces, it's in the container, so it has it's own numbering.

># ps aux|grep agetty
>root     25577  0.0  0.0   6424   792 pts/2    Ss+  10:13   0:00
>/sbin/agetty --noclear --keep-baud console 115200 38400 9600
>

This should depend on the configuration of your guest.  If you use some
systemd system, it should have it's configuration for each pty.  If
there is none, it will probably not start any.

>I guess libvirt does some magic I'm not aware of to handle the consoles
>for the containers. The question is why are we hitting this issue with
>zombie processes that are caused by the console buffer filling up?
>

I'm not able to reproduce your issue, but that might be because I'm not
running systemd neither in the container nor in the host.  If we can
reproduce it without systemd, though, that would be very helpful for
finding out the cause of all this.

>Peter
>
>_______________________________________________
>libvirt-users mailing list
>libvirt-users at redhat.com
>https://www.redhat.com/mailman/listinfo/libvirt-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20160310/349ac725/attachment.sig>


More information about the libvirt-users mailing list