[libvirt] [RFC PATCH] qcow2: Fix race in cache invalidation

Wed Sep 24 09:48:36 UTC 2014

Am 23.09.2014 um 10:47 hat Alexey Kardashevskiy geschrieben:
> On 09/19/2014 06:47 PM, Kevin Wolf wrote:> Am 16.09.2014 um 14:59 hat Paolo Bonzini geschrieben:
> >> Il 16/09/2014 14:52, Kevin Wolf ha scritto:
> >>> Yes, that's true. We can't fix this problem in qcow2, though, because
> >>> it's a more general one.  I think we must make sure that
> >>> bdrv_invalidate_cache() doesn't yield.
> >>>
> >>> Either by forbidding to run bdrv_invalidate_cache() in a coroutine and
> >>> moving the problem to the caller (where and why is it even called from a
> >>> coroutine?), or possibly by creating a new coroutine for the driver
> >>> callback and running that in a nested event loop that only handles
> >>> bdrv_invalidate_cache() callbacks, so that the NBD server doesn't get a
> >>> chance to process new requests in this thread.
> >>
> >> Incoming migration runs in a coroutine (the coroutine entry point is
> >> process_incoming_migration_co).  But everything after qemu_fclose() can
> >> probably be moved into a separate bottom half, so that it gets out of
> >> coroutine context.
> > 
> > Alexey, you should probably rather try this (and add a bdrv_drain_all()
> > in bdrv_invalidate_cache) than messing around with qcow2 locks. This
> > isn't a problem that can be completely fixed in qcow2.
> 
> 
> Ok. Tried :) Not very successful though. The patch is below.
> 
> Is that the correct bottom half? When I did it, I started getting crashes
> in various sport on accesses to s->l1_cache which is NULL after qcow2_close.
> Normally the code would check s->l1_size and then use but they are out of sync.

No, that's not the place we were talking about.

What Paolo meant is that in process_incoming_migration_co(), you can
split out the final part that calls bdrv_invalidate_cache_all() into a
BH (you need to move everything until the end of the function into the
BH then). His suggestion was to move everything below the qemu_fclose().

> So I clear it in qcow2_close(). This allowed migrated guest to work and not
> to crash until I shut it down when it aborted at "HERE IT FAILS ON SHUTDOWN".
> 
> Here I realized I am missing something in this picture again, what is it?

The problem with your patch seems to be that you close the image and
then let the VM access the image before it is reopened in the BH. That
can't work out well. This is why it's important that the vm_start() call
is in the BH, too.

>  void bdrv_invalidate_cache_all(Error **errp)
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> index 904f6b1..59ff48c 100644
> --- a/block/qcow2-cache.c
> +++ b/block/qcow2-cache.c
> @@ -259,7 +259,7 @@ static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
>      if (min_index == -1) {
>          /* This can't happen in current synchronous code, but leave the check
>           * here as a reminder for whoever starts using AIO with the cache */
> -        abort();
> +        abort(); // <==== HERE IT FAILS ON SHUTDOWN
>      }
>      return min_index;
>  }

It's a weird failure mode anyway...

Kevin