[libvirt] QMP fallback race in libvirt

Sat Jun 7 01:47:05 UTC 2014

Hi,

Eric asked me to move this here from #virt so it doesn't get forgotten.

I hit a weird bug in a new install of libvirt on Debian Jessie this week
where a vm could not be configured to use any CPU type except passthrough.

After much digging and headscratching, the immediate cause for that turns
out to be one of the (three) files in the qemu/capabilities cache being
generated wrongly the first time that libvirtd was started.  Instead of
being generated from the QMP queries, it appears to have fallen back to
the old method of scraping 'qemu -cpu help', and since the output of that
changed with qemu 2.0 it leads to things like:

<cpu name='SandyBridge  Intel Xeon E312xx (Sandy Bridge)                '/>

and hilarity then ensues when cpuModelIsAllowed() is called by x86Decode().

Since only one of the cache files there was corrupted like this, it would
appear libvirt either didn't wait long enough, or didn't try hard enough,
to get a connection to the monitor for the QMP query on what was probably
also the very first time qemu had been started on this host machine.

After nuking the cache files and restarting libvirtd they were then
correctly regenerated, and things began to work as expected.

This was all done on a new clean install of the host machine, so there
was nothing around from any earlier versions to get tangled up with,
and possibly means qemu had some first time init of its own to do which
took some time before it was ready to be queried.

I am also seeing bursts of several of these warnings in the logs:

libvirtd[26475]: This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous

Which I haven't confirmed as being related, but doesn't seem to be
obviously unrelated either, and at worst is a separate bug.

  Cheers,
  Ron