[libvirt] [RFC PATCH] qcow2: Fix race in cache invalidation

Kevin Wolf kwolf at redhat.com
Thu Sep 25 12:39:44 UTC 2014


Am 25.09.2014 um 14:29 hat Alexey Kardashevskiy geschrieben:
> On 09/25/2014 08:20 PM, Kevin Wolf wrote:
> > Am 25.09.2014 um 11:55 hat Alexey Kardashevskiy geschrieben:
> >> Right. Cool. So is below what was suggested? I am doublechecking as it does
> >> not solve the original issue - the bottomhalf is called first and then
> >> nbd_trip() crashes in qcow2_co_flush_to_os().
> >>
> >> diff --git a/block.c b/block.c
> >> index d06dd51..1e6dfd1 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -5037,20 +5037,22 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
> >> Error **errp)
> >>      if (local_err) {
> >>          error_propagate(errp, local_err);
> >>          return;
> >>      }
> >>
> >>      ret = refresh_total_sectors(bs, bs->total_sectors);
> >>      if (ret < 0) {
> >>          error_setg_errno(errp, -ret, "Could not refresh total sector count");
> >>          return;
> >>      }
> >> +
> >> +    bdrv_drain_all();
> >>  }
> > 
> > Try moving the bdrv_drain_all() call to the top of the function (at
> > least it must be called before bs->drv->bdrv_invalidate_cache).
> 
> 
> Ok, I did. Did not help.
> 
> 
> > 
> >> +static QEMUBH *migration_complete_bh;
> >> +static void process_incoming_migration_complete(void *opaque);
> >> +
> >>  static void process_incoming_migration_co(void *opaque)
> >>  {
> >>      QEMUFile *f = opaque;
> >> -    Error *local_err = NULL;
> >>      int ret;
> >>
> >>      ret = qemu_loadvm_state(f);
> >>      qemu_fclose(f);
> > 
> > Paolo suggested to move eveything starting from here, but as far as I
> > can tell, leaving the next few lines here shouldn't hurt.
> 
> 
> Ouch. I was looking at wrong qcow2_fclose() all this time :)
> Aaaany what you suggested did not help -
> bdrv_co_flush() calls qemu_coroutine_yield() while this BH is being
> executed and the situation is still the same.

Hm, do you have a backtrace? The idea with the BH was that it would be
executed _outside_ coroutine context and therefore wouldn't be able to
yield. If it's still executed in coroutine context, it would be
interesting to see who that caller is.

Kevin




More information about the libvir-list mailing list