[libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

Laine Stump laine at laine.org
Mon Apr 9 16:12:58 UTC 2018


On 04/06/2018 12:27 PM, John Ferlan wrote:
>
> On 04/03/2018 07:47 AM, Marc Hartmayer wrote:
>> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer <mhartmay at linux.vnet.ibm.com> wrote:
>>> Hi,
>>>
>>> there is a race condition between 'qemuDomainCreate' and
>>> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
>>> accessing priv->monConfig. The race condition can be easily reproduced
>>> using gdb.
>>>
>>> (gdb) set non-stop on
>>> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
>>> (gdb) b qemu_process.c:1799
>>> # Actually, this second breakpoint is optional but it’s good to see
>>> where priv->monConfig is set to NULL
>>> # set breakpoint on line priv->monConfig = NULL;
>>> (gdb) b qemu_process.c:6589
>>> (gdb) run
>>> # continue all threads - just for the case we hit a breakpoint already
>>> (gdb) c -a
>>>
>>> Now start a domain (that is using QEMU)
>>>
>>> $ virsh start domain
>>>
>>> The first breakpoint will be hit. Now run in a second shell
>>>
>>> $ virsh destroy domain
>>>
>>> The second breakpoint will be hit. Continue the thread where the second
>>> breakpoint was hit (for this example this is thread 4)
>>>
>>> (gdb) thread apply 4 continue
>>>
>>> Now continue the thread where the first breakpoint was hit.
>>>
>>> => Segmentation fault because of a NULL pointer dereference at
>>>    config->value
>>>
>>> Since I'm not very familiar with that part of the code, I wanted to ask
>>> for your advice.
>>>
>>> Thanks in advance.
>>>
>>> Beste Grüße / Kind regards
>>>    Marc Hartmayer
>>>
>>> IBM Deutschland Research & Development GmbH
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>>> Geschäftsführung: Dirk Wittkopp
>>> Sitz der Gesellschaft: Böblingen
>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>> Any ideas?
>>
> Seeing as no one else has an exact or authoritative answer...
>
> qemuDomainCreate{XML|WithFlags} (and a few others) will call
> qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and
>         qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY
> to be run.
>
> The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls
> qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which
> again IIUC is allowed to happen alongside the Async job because of the
> mask setting.
>
> In the code where you've broken during create, the @vm object lock is
> dropped allowing destroy to obtain it. So with the perfect timing and
> storm the window of opportunity does exist that the monConfig could be
> free'd and thus priv->monConfig set to NULL before or during the create
> processing uses it in qemuMonitorOpen.  If during the processing, then
> obviously the config-> would "appear" to be valid, but it may not
> actually be what was passed.
>
> The fix I believe involves using objects for virDomainChrSourceDef
> rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put
> together a few patches and will post them shortly. Using the patches I
> don't see a core, but rather the (I believe) expected "error: internal
> error: qemu unexpectedly closed the monitor"

Isn't this just a symptom of a wider problem? Your patches fix the
problem for monConfig, but what about all the other members of the
domain object that aren't protected with the trappings of virObject?
Isn't it just as likely that they could fall prey to the same problem?




More information about the libvir-list mailing list