[libvirt] [PATCH 0/4]

Daniel P. Berrange berrange at redhat.com
Tue Aug 17 16:00:22 UTC 2010


For 

  https://bugzilla.redhat.com/show_bug.cgi?id=620847

We have had sporadic reports of

  # virsh capabilities
  error: failed to get capabilities
  error: server closed connection:

This normally means that libvirtd has crashed, closing the connection
but in this case libvirtd has always remained running. It turns out
that the capabilities XML was too large for the remote RPC message
size. This caused XDR serialization to fail. This caused libvirtd to
close the client connection immediately. The cause of the large XML
was node handling an edge case in libnuma where it returns a CPU mask
of all-1s to indicate a non-existant node.

Machines that exhibit this problem will show this as a symptom in
the logs

 # grep NUMA /var/log/messages 
 Aug 16 10:30:34 sgi-xe270-01 libvirtd: 10:30:34.933: warning : 
 nodeCapsInitNUMA:388 : NUMA topology for cell 1 of 2 not available, ignoring

And have sparse NUMA topology (ie empty nodes)

This series does many things:

 - Adds explicit warnings in places where XDR serialization fails,
   so we see an indication of problem in /var/log/messages
 - Try to send a real remote_error back to client, instead of
   closing its connection
 - Add logging of capabilities XML in libvirt.c so we can identify
   the too large doc in libvirtd
 - Add fix to cope with all-1s node mask

This may also fix some other unexplained bug reports we've had with
'server closed connection' messages, or at least make it possible
to diagnose them

Daniel




More information about the libvir-list mailing list