[libvirt] [PATCH RESEND RFC v4 1/6] Introduce the function virCgroupForVcpu

Thu Jul 21 15:25:45 UTC 2011

On 07/21/2011 09:29 AM, Daniel P. Berrange wrote:
> On Thu, Jul 21, 2011 at 08:49:28AM -0500, Adam Litke wrote:
>> On 07/21/2011 08:34 AM, Daniel P. Berrange wrote:
>>> On Thu, Jul 21, 2011 at 07:54:05AM -0500, Adam Litke wrote:
>>>> Added Anthony to give him the opportunity to address the finer points of
>>>> this one especially with respect to the qemu IO thread(s).
>>>>
>>>> This feature is really about capping the compute performance of a VM
>>>> such that we get consistent top end performance.  Yes, qemu has non-VCPU
>>>> threads that this patch set doesn't govern, but that's the point.  We
>>>> are not attempting to throttle IO or device emulation with this feature.
>>>>  It's true that an IO-intensive guest may consume more host resources
>>>> than a compute intensive guest, but they should still have equal top-end
>>>> CPU performance when viewed from the guest's perspective.
>>>
>>> I could be mis-understanding, what you're trying to achieve,
>>> here, so perhaps we should consider an example.
>>
>> From your example, it's clear to me that you understand the use case well.
>>
>>>  - A machine has 4 physical CPUs
>>>  - There are 4 guests on the machine
>>>  - Each guest has 2 virtual CPUs
>>>
>>> So we've overcommit the host CPU resources x2 here.
>>>
>>> Lets say that we want to use this feature to ensure consistent
>>> top end performance of every guest, splitting the host pCPUs
>>> resources evenly across all guests, so each guest is ensured
>>> 1 pCPU worth of CPU time overall.
>>>
>>> This patch lets you do this by assigning caps per VCPU. So
>>> in this example, each VCPU cgroup would have to be configured
>>> to cap the VCPUs at 50% of a single pCPU.
>>>
>>> This leaves the other QEMU threads uncapped / unaccounted
>>> for. If any one guest causes non-trivial compute load in
>>> a non-VCPU thread, this can/will impact the top-end compute
>>> performance of all the other guests on the machine.
>>>
>>> If we did caps per VM, then you could set the VM cgroup
>>> such that the VM as a whole had 100% of a single pCPU.
>>>
>>> If a guest is 100% compute bound, it can use its full
>>> 100% of a pCPU allocation in vCPU threads. If any other
>>> guest is causing CPU time in a non-VCPU thread, it cannot
>>> impact the top end compute performance of VCPU threads in
>>> the other guests.
>>>
>>> A per-VM cap would, however, mean a guest with 2 vCPUs
>>> could have unequal scheduling, where one vCPU claimed 75%
>>> of the pCPU and the othe vCPU got left with only 25%.
>>>
>>> So AFAICT, per-VM cgroups is better for ensuring top
>>> end compute performance of a guest as a whole, but
>>> per-VCPU cgroups can ensure consistent top end performance
>>> across vCPUs within a guest.
>>>
>>> IMHO, per-VM cgroups is the more useful because it is the
>>> only way to stop guests impacting each other, but there
>>> could be additional benefits of *also* have per-VCPU cgroups
>>> if you want to ensure fairness of top-end performance across
>>> vCPUs inside a single VM.
>>
>> What this says to me is that per-VM cgroups _in_addition_to_ per-vcpu
>> cgroups is the _most_ useful situation.  Since I can't think of any
>> cases where someone would want per-vm and not per-vcpu, how about we
>> always do both when supported.  We can still use one pair of tunables
>> (<period> and <quota>) and try to do the right thing.  For example:
>>
>> <vcpus>2</vcpus>
>> <cputune>
>>   <period>500000</period>
>>   <quota>250000</quota>
>> </cputune>
>>
>> Would have the following behavior for qemu-kvm (vcpu threads)
>>
>> Global VM cgroup: cfs_period:500000 cfs_quota:500000
>> Each vcpu cgroup: cfs_period:500000 cfs_quota:250000
>>
>> and this behavior for qemu with no vcpu threads
> 
> So, whatever quota value is in the XML, you would multiply that
> by the number of vCPUS and use it to set the VM quota value ?

Yep.

> I'm trying to think if there is ever a case where you don't want
> the VM to be a plain multiple of the VCPU value, but I can't
> think of one.
> 
> So only the real discussion point here, is whether the quota
> value in the XML, is treated as a per-VM value, or a per-VCPU
> value.

I think it has to be per-VCPU.  Otherwise the user will have to remember
to do the multiplication themselves.  If they forget to do this they
will get a nasty performance surprise.

> cpu_shares is treated as a per-VM value, period doesn't matter
> but cpu_quota would be a per-VCPU value, multiplied to get a
> per-VM value when needed. I still find this mis-match rather
> wierd to be fair.

Yes, this is unfortunate.  But cpu_shares is a comparative value whereas
quota is quantitative.  In the future we could apply 'shares' at the
vcpu level too.  In that case we'd just pick some arbitrary number and
apply it to each vcpu cgroup.

> So current behaviour
> 
>  if vCPU threads
>     set quota in vCPU group
>  else
>     set nVCPUs * quota in  VM group
> 
> Would change to
> 
>  set nVCPUs * quota in  VM group
>  if vCPU threads
>     set quota in vCPU group
> 
> ?

Yes.

> We need to remember to update the VM cgroup if we change the
> number of vCPUs on a running guest of course

When can this happen?  Does libvirt support cpu hotplug?

-- 
Adam Litke
IBM Linux Technology Center