[libvirt] [PATCH v2 2/4] qemu: caps: add QEMU_CAPS_QCOW2_L2_CACHE_SIZE

Wed Dec 12 14:28:14 UTC 2018

On 12/12/18 5:59 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 11.12.2018 19:41, John Ferlan wrote:
>>
>>
>> On 12/11/18 6:04 AM, Nikolay Shirokovskiy wrote:
>>>
>>>
>>> On 10.12.2018 19:58, John Ferlan wrote:
>>>>
>>>>
>>>> On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
>>>>> For qemu capable of setting l2-cache-size for qcow2 images
>>>>> to INT64_MAX and semantics of upper limit on l2 cache
>>>>> size. We can only check this by qemu version (3.1.0) now.
>>>>>
>>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>>>> ---
>>>>>  src/qemu/qemu_capabilities.c | 5 +++++
>>>>>  src/qemu/qemu_capabilities.h | 1 +
>>>>>  2 files changed, 6 insertions(+)
>>>>>
>>>>
>>>> This patch needs to be updated to top of tree.
>>>>
>>>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>>>> index 2ca5af3..49a3b60 100644
>>>>> --- a/src/qemu/qemu_capabilities.c
>>>>> +++ b/src/qemu/qemu_capabilities.c
>>>>> @@ -509,6 +509,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
>>>>>                "vfio-pci.display",
>>>>>                "blockdev",
>>>>>                "vfio-ap",
>>>>> +              "qcow2.l2-cache-size",
>>>>
>>>> When you do update, you will need to fix the comma-less entry for
>>>> "egl-headless.rendernode" as well, unless someone else gets to it first.
>>>>
>>>>>      );
>>>>>  
>>>>>  
>>>>> @@ -4117,6 +4118,10 @@ virQEMUCapsInitQMPMonitor(virQEMUCapsPtr qemuCaps,
>>>>>          virQEMUCapsSet(qemuCaps, QEMU_CAPS_MACHINE_PSERIES_MAX_CPU_COMPAT);
>>>>>      }
>>>>>  
>>>>> +    /* l2-cache-size before 3001000 does not accept INT64_MAX */
>>>>> +    if (qemuCaps->version >= 3001000)
>>>>> +        virQEMUCapsSet(qemuCaps, QEMU_CAPS_QCOW2_L2_CACHE_SIZE);
>>>>> +
>>>>
>>>> Not a fan of this.
>>>>
>>>> The 'l2-cache-size' was supported since QEMU 2.2. A "newer" algorithm
>>>> for cache adjustment has been supported since QEMU 2.12 (and there's
>>>> l2-cache-entry-size that "could be used" to know that). Then there's
>>>> some unreferenced commit indicating usage of INT64_MAX. Tracking that
>>>> down, I find qemu.git commit 6c1c8d5d from Max which enforces MAX_INT.
>>>
>>> This is a failure rather than enforcement. And AFAIU code that limit cache
>>> to appropriate maximum when INT64_MAX is given in read_cache_sizes:
>>>
>>> *l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
>>>
>>> only appeared after release of 3.0 in 
>>>
>>> commit b749562d9822d14ef69c9eaa5f85903010b86c30
>>> Author: Leonid Bloch <lbloch at janustech.com>
>>> Date:   Wed Sep 26 19:04:43 2018 +0300
>>>
>>>     qcow2: Assign the L2 cache relatively to the image size
>>>
>>>
>>> ie setting cache size to INT64_MAX before 3.1 will fail. In
>>> other words in earlier versions there were no value to specify that
>>> cache size need to be set to maximum. You can calculate this value
>>> yourself knowing disk size and disk cluster size and set it but
>>> this is not convinient.
>>>
>>
>> So prior to this patch the max could be 32 MB? And now it's calculate-able?
> 
> 32 MB is default value on Linux in 3.1 (the patch set it it to 1 MB but later
> patch 80668d0f increases to 32 MB). 
> 
> Before the patch one can set greater values but can not set INT64_MAX.
> The exact value that can be set depends on cluster size due to the check
> againts INT_MAX you already mentioned. So one can set up to INT_MAX * cluster_size,
> but cluster_size is unknown to us (?). Given minium cluster size is 512 bytes, we
> can set INT_MAX * 512 (1 TB) always. This will cover 64 TB disks for cluster
> size 512 bytes and 8192 PB for default cluster size. So we should refuse 
> to start with policy 'maximum' and disk size greater then 64 TB. 
> Or may be we can just check cluster size at start up. How do you think?

I'm not sure I'd go through the trouble of determining the size. If
we're not changing the default cluster_size, then using 512 "seems"
reasonable (whether others feel the same way who knows yet). I'm not
even sure if we already get that information - feel free to research and
add though.

I'm not sure I would "refuse" to start - think in terms of what is the
downside to not being able to have a large enough value - well that
seems to be I/O performance.  If we can get "better" I/O performance by
providing "something" more than "default" for those environments, then I
would think that's better than doing nothing or failing to start.

I'm not sure how much "effort" you want to put into the window of
support prior to 3.1 where "maximum" is true maximum.  It could be
simple enough to document that for prior to 3.1 the maximum value is
limited to XXX. After 3.1 it's essentially unlimited since libvirt can
provide the maximum integer value to QEMU and QEMU then decides the
maximum up to that value.  There's ways to wordsmith it - I'm pretty
sure it's been done before, but I don't have an example readily
available (search docs for size, maximum, or default)...

> 
> By the way the patch works only for -blockdev configuration which is
> available since QEMU 2.10 AFAIK. So we could purse to support 2.10, 2.11,
> 2.12 and 3.0 version in principle.

Ah true, the --blockdev option is required.

I've put the "artificial" point in time of 2.12 because that when I
recall Berto making adjustments to the cache size algorithms and when
the @l2-cache-entry-size appeared in qapi/block-core.json. I think prior
to those changes the algorithm was even more difficult to decipher.

> 
>>
>> Do you think it would be "worthwhile" to add logic that knows QEMU 2.12
>> and 3.0 could still support using l2_cache_size, but with a lower value
>> say 32MB? Then if we find 3.1 installed then we can supply INT_MAX for
>> the value?  Oh, and if the "maximum" attribute isn't set, then the
>> default of 0 is used. Thus being able to use "Z" instead of "I" for patch 3?
> 
> In default case l2-cache-size just does not get set.
> 

Yes on earlier support and I think using l2-cache-entry-size will give
us a point in time to support from. The max size chosen is up to you - I
think if we document what we set to prior to QEMU 3.1, then we'll be good.

Adding a second capability that would be (yuck) version based because we
know a bug was fixed is going to have to be acceptable. Then, the logic
to "set" the value would be something like:

    if (domain maximum provided) {
        if (some 3.1 capability is set)
            max = INT64_MAX;
        else if (at least 2.12 capability is set)
            max = "whatever smaller value you choose"
        else
            error this qemu doesn't support setting...
    }

>>
>> BTW:
>> One issue with providing a numerical QEMU version for something is that
>> someone could backport that patch (and it's dependencies) into some
>> maintenance branch or even downstream repo that won't report as 3.1, but
>> would have the patch. But since libvirt would only accept 3.1 or later,
>> it wouldn't allow usage.
> 
> Does not look critical. The functionality just won't work from start
> and when they start to investigate why in the result they can backport 
> appropriate qemu patches as well and fix libvirt caps code for example.

True, not critical, but it's one of the reasons numerical version
checking is disliked. Similarly providing in docs that something is
supported as of QEMU maj.min will get "dinged" with the reasoning of
what if someone backports...  and I have a similar response to yours for
that.

John

> 
> Nikolay
> 
>>
>>
>>>>
>>>> Still does that really matter? Let QEMU pick their own max and don't
>>>> have libvirt be the arbiter of that (like I noted in my 1/4 review). My
>>>> reasoning is that there's been quite a few "adjustments" to the
>>>> algorithms along the way. Those adjustments are one of the concerns
>>>> voiced in the past why making any "semi-permanent" change to the value
>>>> in some libvirt XML format has been NACKed historically. THus by placing
>>>> boundaries that are true today we limit ourselves for the future.
>>>
>>> IIUC setting INT64_MAX is ok in this respect. It is not real value for cache
>>> but rather order for QEMU to pick up one.
>>>
>>
>> Right providing some unrealistically large value would appear to work.
>> What that value "could be" for 2.12 and 3.0.0 which do support some
>> level of alteration and would be 'nice' to include in the mix, but not
>> required I suppose. And yes, I do have some downstream representation in
>> mind by thinking about this - it's in beta now ;-).
>>
>> John
>>
>>>>
>>>> BTW: If 3.1 was the "base" from which you want to work, then adjustments
>>>> to the tests/qemucapabilitiesdata/*3_1*{.replies|.xml } files would be
>>>> required as evidenced by your patch 4. The *.xml file would need to have
>>>> the correct <version>3001000</version> setting which should allow patch4
>>>> to be merged into patch3.
>>>
>>> Yeah, but 3.1 is not yet released and I need blockdev also as
>>> patch only makes difference in latter case.
>>>
>>> Nikolay
>>>
>>>>
>>>> John
>>>>
>>>>>      if (virQEMUCapsProbeQMPCommands(qemuCaps, mon) < 0)
>>>>>          goto cleanup;
>>>>>  
>>>>> diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
>>>>> index 6bb9a2c..bb2ac17 100644
>>>>> --- a/src/qemu/qemu_capabilities.h
>>>>> +++ b/src/qemu/qemu_capabilities.h
>>>>> @@ -493,6 +493,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */
>>>>>      QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */
>>>>>      QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */
>>>>>      QEMU_CAPS_DEVICE_VFIO_AP, /* -device vfio-ap */
>>>>> +    QEMU_CAPS_QCOW2_L2_CACHE_SIZE, /* -blockdev supports l2-cache-size with INT64_MAX value */
>>>>>  
>>>>>      QEMU_CAPS_LAST /* this must always be the last item */
>>>>>  } virQEMUCapsFlags;
>>>>>