[PATCH] qemu: Do not error out when setting affinity failed

Daniel P. Berrangé berrange at redhat.com
Fri Sep 4 12:32:24 UTC 2020


On Fri, Sep 04, 2020 at 02:22:14PM +0200, Martin Kletzander wrote:
> At least in a particular scenario described in the code.  Basically when
> libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
> to set QEMU affinity to all CPUs (because there is no setting requested in the
> XML) it fails.  But if we ignore the failure in this particular case than you
> can limit the CPUs used by controlling the affinity for libvirtd itself.
> 
> In any other case (anything requested in the XML, pinning a live domain, etc.)
> the call is still considered fatal and the action errors out.
> 
> Resolves: https://bugzilla.redhat.com/1819801

I'd prefer if this commit message outlined the reason why this change is
ok, instead of just pointing to the BZ

eg add the following text:


Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

> 
> Suggested-by: Daniel P. Berrangé <berrange at redhat.com>
> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
> ---
>  src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index cfe09d632633..270bb37d3682 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
>  static int
>  qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>  {
> +    bool settingAll = false;
>      g_autoptr(virBitmap) cpumapToSet = NULL;
>      virDomainNumatuneMemMode mem_mode;
>      qemuDomainObjPrivatePtr priv = vm->privateData;
> @@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>          if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
>              return -1;
>      } else {
> +        settingAll = true;
>          if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
>              return -1;
>      }
>  
>      if (cpumapToSet &&
>          virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
> -        return -1;
> +        /*
> +         * We only want to error out if we failed to set the affinity to
> +         * user-requested mapping.  If we are just trying to reset the affinity
> +         * to all CPUs and this fails it can only be an issue if:
> +         *  1) libvirtd does not have CAP_SYS_NICE
> +         *  2) libvirtd does not run on all CPUs
> +         *
> +         * However since this scenario is very improbable, we rather skip
> +         * reporting the error because it helps running libvirtd in a a scenario
> +         * where pinning is handled by someone else.

I wouldn't call this scenario "improbably" - it is entirely expected
by some of our users. Replace these three lines with

  "This scenario can easily occurr when libvirtd is run
   inside a container with restrictive permissions and CPU
   pinning"


With the text changes

Reviewed-by: Daniel P. Berrangé <berrange at redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list