[libvirt] [RFC PATCH] cgroup: Use system reported "unlimited" value for comparison

Viktor Mihajlovski mihajlov at linux.vnet.ibm.com
Tue Dec 6 12:11:12 UTC 2016


On 30.11.2016 14:22, Viktor Mihajlovski wrote:
> On 30.11.2016 13:20, Martin Kletzander wrote:
>> On Mon, Nov 28, 2016 at 06:03:36PM +0100, Viktor Mihajlovski 
>> wrote:
>>> With kernel 3.18 (since commit 
>>> 3e32cb2e0a12b6915056ff04601cf1bb9b44f967) the "unlimited" value 
>>> for cgroup memory limits has changed once again as its byte
>>> value is now computed from a page counter. The new "unlimited"
>>> value reported by the cgroup fs is therefore 2**51-1 pages which
>>> is (VIR_DOMAIN_MEMORY_PARAM_UNLIMITED - 3072). This results e.g.
>>> in virsh memtune displaying 9007199254740988 instead of unlimited
>>>  for the limits.
>>>
>>> This patch uses the value of memory.limit_in_bytes from the 
>>> cgroup memory root which is the system's "real" unlimited value 
>>> for comparison.
>>>
>>> See also libvirt commit 231656bbeb9e4d3bedc44362784c35eee21cf0f4 
>>> for the history for kernel 3.12 and before.
>>>
>>> I've tested this on F24 with the following configurations: - no 
>>> memory cgroup controller mounted - memory cgroup controller 
>>> mounted but not configured for libvirt - memory cgroup
>>> controller mounted and configured The first two fail as expected
>>> (and as before), the third case works as expected.
>>>
>>> Testing on other kernel versions highly welcome!
>>>
>>> Not perfect yet in that we still provide a fallback to the old 
>>> value. We might consider failing right away if we can't get the 
>>> system value. I'd be inclined to do that, since we're probably 
>>> facing principal cgroup issues in this case.
>>>
>>
>> Since the code is called only after reading another value worked, 
>> it _should not_ fail =)  But I'm OK with both failing and falling 
>> back to the old value.  Mostly because I don't think it will make 
>> any (significant) difference.
>>
> OK, this tips me towards the no fallback.
>>> Further, it's not the most efficient implementation. Obviously, 
>>> the unlimited value can be read once and cached. However, I'd 
>>> like to see the question above resolved first.
>>>
>>
>> But I would really like to cache the value in a global variable. 
>> You can use VIR_ONCE_GLOBAL_INIT for that, but maybe it's too much,
>> especially if you init the value before any other thread could
>> access it.
>>
> Sure, even if the initialization was racy, I see no way the global
> long long value could be corrupted.
>>> Signed-off-by: Viktor Mihajlovski <mihajlov at linux.vnet.ibm.com> 
>>> --- src/util/vircgroup.c | 61 
>>> ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file 
>>> changed, 55 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 
>>> f151193..969dca5 100644 --- a/src/util/vircgroup.c +++ 
>>> b/src/util/vircgroup.c @@ -2452,6 +2452,40 @@ 
>>> virCgroupGetBlkioDeviceWeight(virCgroupPtr group, }
>>>
>>>
>>> +/* + * Retrieve the "memory.limit_in_bytes" value from the 
>>> memory controller + * root dir. This value cannot be modified by 
>>> userspace and therefore + * is the maximum limit value supported 
>>> by cgroups on the local system. + */ +static int 
>>> +virCgroupGetMemoryUnlimited(unsigned long long int * 
>>> mem_unlimited) +{ +    int ret = -1; +    virCgroupPtr group; +
>>
>> Also, all this ↓
>>
>>> +    if (VIR_ALLOC(group)) +        goto cleanup; + +    if 
>>> (virCgroupDetectMounts(group)) +        goto cleanup; + +    if 
>>> (!group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint) + 
>>> goto cleanup; + +    if 
>>> (VIR_STRDUP(group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].placement,
>>>
>>>
>>>
> +                   "/.") < 0)
>>> +        goto cleanup; +
>>
>> ↑ would be cleaner this way:
>>
>> if (virCgroupNew(-1, "/", NULL, NULL, &group) < 0) return -1;
>>
>> if (!virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_MEMORY))
>>  goto cleanup;
>>
>> I'm not passing VIR_CGROUP_CONTROLLER_MEMORY to virCgroupNew() so 
>> that it doesn't fail with a cryptic message.
>>
> Looks cleaner indeed ... I'll give it a try.
>>> +    ret = virCgroupGetValueU64(group, + 
>>> VIR_CGROUP_CONTROLLER_MEMORY, + "memory.limit_in_bytes", + 
>>> mem_unlimited); + cleanup: +    virCgroupFree(&group); + return
>>> ret; +} + + /** * virCgroupSetMemory: * @@ -2534,6 +2568,7 @@ int
>>> virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long
>>> long *kb) { long long unsigned int limit_in_bytes; +    long long
>>> unsigned int unlimited_in_bytes; int ret = -1;
>>>
>>> if (virCgroupGetValueU64(group, @@ -2541,9 +2576,13 @@ 
>>> virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long 
>>> long *kb) "memory.limit_in_bytes", &limit_in_bytes) < 0) goto 
>>> cleanup;
>>>
>>> -    *kb = limit_in_bytes >> 10; -    if (*kb > 
>>> VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) +    if 
>>> (virCgroupGetMemoryUnlimited(&unlimited_in_bytes) < 0) + 
>>> unlimited_in_bytes = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED << 10; +
>>> + if (limit_in_bytes == unlimited_in_bytes)
>>
>> I don't know why, but I would feel more comfortable with '>=' 
>> there, although it doesn't make any difference (or sense).
>>
>>> *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; +    else +        *kb
>>> = limit_in_bytes >> 10;
>>>
>>> ret = 0; cleanup:
>>
>> As a nit, helper function for these would be nice.
>>
>> Otherwise, I like it.
> Thanks for the feedback.
> 
FYI, I've posted new version that uses thread-safe initialization
https://www.redhat.com/archives/libvir-list/2016-December/msg00176.html

According to some documents I've read it seems that on ARM 64-bit memory
access may be interrupted, so I didn't want to take a chance.
The checking code is not really performance critical, as we have to walk
the cgroup filesyste anyway in order to get the process' limits.
-- 

Mit freundlichen Grüßen/Kind Regards
   Viktor Mihajlovski

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Köderitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294




More information about the libvir-list mailing list