[libvirt] [PATCH] Fix race condition reconnecting to vms & loading configs
Daniel P. Berrange
berrange at redhat.com
Mon Oct 28 14:07:16 UTC 2013
On Mon, Oct 28, 2013 at 09:17:16AM -0400, John Ferlan wrote:
> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
> > From: "Daniel P. Berrange" <berrange at redhat.com>
> >
> > The following sequence
> >
> > 1. Define a persistent QMEU guest
>
> s/QMEU/QEMU
>
> > 2. Start the QEMU guest
> > 3. Stop libvirtd
> > 4. Kill the QEMU process
> > 5. Start libvirtd
> > 6. List persistent guets
>
> s/guets/guests
>
> >
> > At the last step, the previously running persistent guest
> > will be missing. This is because of a race condition in the
> > QEMU driver startup code. It does
> >
> > 1. Load all VM state files
> > 2. Spawn thread to reconnect to each VM
> > 3. Load all VM config files
> >
> > Only at the end of step 3, does the 'virDomainObjPtr' get
> > marked as "persistent". There is therefore a window where
> > the thread reconnecting to the VM will remove the persistent
> > VM from the list.
> >
> > The easy fix is to simply switch the order of steps 2 & 3.
> >
> > Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> > ---
> > src/qemu/qemu_driver.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
>
> The fix seems reasonable, although I immediately wondered why "at some
> time" it was considered OK to reconnect before being persistent flag was
> set for inactive guests. The condition/fault initially described
> includes a host reboot in the processing. In that case (I would assume)
> the restart of the guest would occur if autostart was set. The external
> action of the kill()'ing of a guest outside of libvirtd's control
> results in some unknown/unpredictable state for the guest. Is there
> something in that initial load that could detect this condition better?
> I tried following the steps without the patch, but on my host the guest
> was listed after the restart - so yes a timing condition - but what
> causes that timing condition.
>
> Would setting the dom->persistent before the virObjectUnlock(dom) in
> virDomainObjListLoadAllConfigs() change the results?
No, I tried that and it didn't help. None the less that should be
fixed separately.
> Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll()
> "together" after the loading of the inactive persistent configs seems to
> keep code flow more normal. Whether that comes before or after the
> Snapshot/ManagedSave load is I suppose just an "implementation detail".
>
> Also, other drivers follow the load running, reconnect, and load
> inactive/persistent configs. Should those have similar patches as well?
Historically it was safe because reconnecting to QEMU was
serialized. At some point we switched to using a separate
thread for reconnecting to QEMU which introduced a race.
The other drivers are still serialized IIUC.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list