[libvirt] QEMU capabilities vs machine types

Fri Feb 13 08:24:20 UTC 2015

On 12.02.2015 20:25, Eduardo Habkost wrote:
> On Wed, Feb 11, 2015 at 05:09:01PM +0100, Michal Privoznik wrote:
>> On 11.02.2015 16:47, Daniel P. Berrange wrote:
>>> On Wed, Feb 11, 2015 at 04:31:53PM +0100, Michal Privoznik wrote:
>>>>
>>>
>>> There are two reasons why we query & check the supported capabilities
>>> from QEMU
>>>
>>>  1. There are multiple possible CLI args for the same feature and
>>>     we need to choose the "best" one to use
>>>
>>>  2. The feature is not supported and we want to give the caller a
>>>     better error message than they'd get from QEMU
>>>
>>> I'm unclear from the bug which scenario applies here.
>>>
>>> If it is scenario 2 though, I'd just mark it as CANTFIX or WONTFIX,
>>> as no matter what we do the user would get an error. It is not worth
>>> making our capability matrix a factor of 10+ bigger just to get a
>>> better error message.
>>>
>>> If it is scenario 1, I think the burden is on QEMU to solve. The
>>> memory-backend-{file,ram} CLI flags shouldn't be tied to guest
>>> machine types, as they are backend config setup options that should
>>> not impact guest ABI.
>>
>> It's somewhere in between 1 and 2. Back in RHEL-7.0 days libvirt would
>> have created a guest with:
>>
>> -numa node,...,mem=1337
>>
>> But if qemu reports it support memory-backend-ram, libvirt tries to use it:
>>
>> -object memory-backend-ram,id=ram-node0,size=1337M,... \
>> -numa node,...,memdev=ram-node0
>>
>> This breaks migration to newer qemu which is in RHEL-7.1. If qemu would
>> report the correct value, we can generate the correct command line and
>> migration succeeds. However, our fault is, we are not asking the correct
>> question anyway.
> 
> I understand that RHEL-7.1 QEMU is not providing enough data for libvirt
> to detect this before it is too late. What I am missing here is: why
> wasn't commit f309db1f4d51009bad0d32e12efc75530b66836b enough to fix
> this specific case?

The numa pinning can be expressed in libvirt in this way:

<numatune>
  <memory mode='strict' nodeset='0-7'/>
  <memnode cellid='0' mode='preferred' nodeset='3'/>
  <memnode cellid='2' mode='strict' nodeset='1-2,5,7'/>
</numatune>

This tells, to pin guest #0 onto host #3, guest #2 onto host #1-2,5, or
7. For the rest of guest numa nodes, they are placed onto host #0-7.
As long as there explicit guest guest numa node pinning onto host nodes
(the <memnode/> element), memory-object-ram is required. However, if
<numatune/> has only one child <memory/> we still can guarantee the
requested configuration in CGroups and don't necessarily need
memory-object-ram.
My patch, you've referred to, was incomplete in this case. Moreover, it
was buggy, it allowed combining use of bare -numa and memory-object-ram
at the same time (which is not allowed).

Michal