[libvirt] [PATCH RESEND RFC v4 1/6] Introduce the function virCgroupForVcpu

Fri Jul 22 06:56:58 UTC 2011

At 07/21/2011 11:25 PM, Adam Litke Write:
> 
> 
> On 07/21/2011 09:29 AM, Daniel P. Berrange wrote:
>> On Thu, Jul 21, 2011 at 08:49:28AM -0500, Adam Litke wrote:
>>> On 07/21/2011 08:34 AM, Daniel P. Berrange wrote:
>>>> On Thu, Jul 21, 2011 at 07:54:05AM -0500, Adam Litke wrote:
>>>>> Added Anthony to give him the opportunity to address the finer points of
>>>>> this one especially with respect to the qemu IO thread(s).
>>>>>
>>>>> This feature is really about capping the compute performance of a VM
>>>>> such that we get consistent top end performance.  Yes, qemu has non-VCPU
>>>>> threads that this patch set doesn't govern, but that's the point.  We
>>>>> are not attempting to throttle IO or device emulation with this feature.
>>>>>  It's true that an IO-intensive guest may consume more host resources
>>>>> than a compute intensive guest, but they should still have equal top-end
>>>>> CPU performance when viewed from the guest's perspective.
>>>>
>>>> I could be mis-understanding, what you're trying to achieve,
>>>> here, so perhaps we should consider an example.
>>>
>>> From your example, it's clear to me that you understand the use case well.
>>>
>>>>  - A machine has 4 physical CPUs
>>>>  - There are 4 guests on the machine
>>>>  - Each guest has 2 virtual CPUs
>>>>
>>>> So we've overcommit the host CPU resources x2 here.
>>>>
>>>> Lets say that we want to use this feature to ensure consistent
>>>> top end performance of every guest, splitting the host pCPUs
>>>> resources evenly across all guests, so each guest is ensured
>>>> 1 pCPU worth of CPU time overall.
>>>>
>>>> This patch lets you do this by assigning caps per VCPU. So
>>>> in this example, each VCPU cgroup would have to be configured
>>>> to cap the VCPUs at 50% of a single pCPU.
>>>>
>>>> This leaves the other QEMU threads uncapped / unaccounted
>>>> for. If any one guest causes non-trivial compute load in
>>>> a non-VCPU thread, this can/will impact the top-end compute
>>>> performance of all the other guests on the machine.
>>>>
>>>> If we did caps per VM, then you could set the VM cgroup
>>>> such that the VM as a whole had 100% of a single pCPU.
>>>>
>>>> If a guest is 100% compute bound, it can use its full
>>>> 100% of a pCPU allocation in vCPU threads. If any other
>>>> guest is causing CPU time in a non-VCPU thread, it cannot
>>>> impact the top end compute performance of VCPU threads in
>>>> the other guests.
>>>>
>>>> A per-VM cap would, however, mean a guest with 2 vCPUs
>>>> could have unequal scheduling, where one vCPU claimed 75%
>>>> of the pCPU and the othe vCPU got left with only 25%.
>>>>
>>>> So AFAICT, per-VM cgroups is better for ensuring top
>>>> end compute performance of a guest as a whole, but
>>>> per-VCPU cgroups can ensure consistent top end performance
>>>> across vCPUs within a guest.
>>>>
>>>> IMHO, per-VM cgroups is the more useful because it is the
>>>> only way to stop guests impacting each other, but there
>>>> could be additional benefits of *also* have per-VCPU cgroups
>>>> if you want to ensure fairness of top-end performance across
>>>> vCPUs inside a single VM.
>>>
>>> What this says to me is that per-VM cgroups _in_addition_to_ per-vcpu
>>> cgroups is the _most_ useful situation.  Since I can't think of any
>>> cases where someone would want per-vm and not per-vcpu, how about we
>>> always do both when supported.  We can still use one pair of tunables
>>> (<period> and <quota>) and try to do the right thing.  For example:
>>>
>>> <vcpus>2</vcpus>
>>> <cputune>
>>>   <period>500000</period>
>>>   <quota>250000</quota>
>>> </cputune>
>>>
>>> Would have the following behavior for qemu-kvm (vcpu threads)
>>>
>>> Global VM cgroup: cfs_period:500000 cfs_quota:500000
>>> Each vcpu cgroup: cfs_period:500000 cfs_quota:250000
>>>
>>> and this behavior for qemu with no vcpu threads
>>
>> So, whatever quota value is in the XML, you would multiply that
>> by the number of vCPUS and use it to set the VM quota value ?
> 
> Yep.
> 
>> I'm trying to think if there is ever a case where you don't want
>> the VM to be a plain multiple of the VCPU value, but I can't
>> think of one.
>>
>> So only the real discussion point here, is whether the quota
>> value in the XML, is treated as a per-VM value, or a per-VCPU
>> value.
> 
> I think it has to be per-VCPU.  Otherwise the user will have to remember
> to do the multiplication themselves.  If they forget to do this they
> will get a nasty performance surprise.
> 
>> cpu_shares is treated as a per-VM value, period doesn't matter
>> but cpu_quota would be a per-VCPU value, multiplied to get a
>> per-VM value when needed. I still find this mis-match rather
>> wierd to be fair.
> 
> Yes, this is unfortunate.  But cpu_shares is a comparative value whereas
> quota is quantitative.  In the future we could apply 'shares' at the
> vcpu level too.  In that case we'd just pick some arbitrary number and
> apply it to each vcpu cgroup.
> 
>> So current behaviour
>>
>>  if vCPU threads
>>     set quota in vCPU group
>>  else
>>     set nVCPUs * quota in  VM group
>>
>> Would change to
>>
>>  set nVCPUs * quota in  VM group
>>  if vCPU threads
>>     set quota in vCPU group
>>
>> ?
> 
> Yes.

I treat this answer as you agree with Daniel P. Berrange's idea.
If so, I will implement it.

> 
>> We need to remember to update the VM cgroup if we change the
>> number of vCPUs on a running guest of course
> 
> When can this happen?  Does libvirt support cpu hotplug?
>