[libvirt] [PATCH v2] Close the source fd if the destination qemu exits during tunnelled migration

Mon Sep 28 15:02:39 UTC 2015

Hi Jon,

Thanks a lot for attaching the patch.  Replies inline.

On Thu, Sep 24, 2015 at 1:52 AM, John Ferlan <jferlan at redhat.com> wrote:
>
>
> On 09/22/2015 07:21 AM, Shivaprasad bhat wrote:
>> On Mon, Sep 21, 2015 at 8:04 PM, John Ferlan <jferlan at redhat.com> wrote:
>>>
>>>
>>> On 09/21/2015 05:09 AM, Shivaprasad bhat wrote:
>>>> Thanks John for the comments.
>>>>
>>>>
>>>> On Fri, Sep 18, 2015 at 10:34 PM, John Ferlan <jferlan at redhat.com> wrote:
>>>>>
>>>>>
>>>>> On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote:
>>>>>> Tunnelled migration can hang if the destination qemu exits despite all the
>>>>>> ABI checks. This happens whenever the destination qemu exits before the
>>>>>> complete transfer is noticed by source qemu. The savevm state checks at
>>>>>> runtime can fail at destination and cause qemu to error out.
>>>>>> The source qemu cant notice it as the EPIPE is not propogated to it.
>>>>>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
>>>>>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
>>>>>> never get to 100% transfer completion.
>>>>>> The qemuMigrationWaitForCompletion() never breaks out as well since
>>>>>> the ssh connection to destination is healthy, and the source qemu also thinks
>>>>>> the migration is ongoing as the Fd to which it transfers, is never
>>>>>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
>>>>>> virsh migrate wouldn't be honoured. Close the source side FD when there is
>>>>>> an error in the stream. That way, the source qemu updates itself and
>>>>>> qemuMigrationWaitForCompletion() notices the failure.
>>>>>>
>>>>>> Close the FD for all kinds of errors to be sure. The error message is not
>>>>>> copied for EPIPE so that the destination error is copied instead later.
>>>>>>
>>>>>> Note:
>>>>>> Reproducible with repeated migrations between Power hosts running in different
>>>>>> subcores-per-core modes.
>>>>>>
>>>>>> Changes from v1 -> v2:
>>>>>>    VIR_FORCE_CLOSE() was called twice for this use case which would log
>>>>>>    unneccessary warnings. So, move the fd close to qemuMigrationIOFunc
>>>>>>    so that there are no unnecessary duplicate attempts.(Would this trigger
>>>>>>    a Coverity error? I don't have a setup to check.)
>>>>>>
>>>>>> Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  src/qemu/qemu_migration.c |    8 ++++++--
>>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>>>>> index ff89ab5..9602fb2 100644
>>>>>> --- a/src/qemu/qemu_migration.c
>>>>>> +++ b/src/qemu/qemu_migration.c
>>>>>> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg)
>>>>>>      if (virStreamFinish(data->st) < 0)
>>>>>>          goto error;
>>>>>>
>>>>>> +    VIR_FORCE_CLOSE(data->sock);
>>>>>>      VIR_FREE(buffer);
>>>>>>
>>>>>>      return;
>>>>>> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg)
>>>>>>      }
>>>>>>
>>>>>>   error:
>>>>>> -    virCopyLastError(&data->err);
>>>>>> +    /* Let the source qemu know that the transfer cant continue anymore.
>>>>>> +     * Don't copy the error for EPIPE as destination has the actual error. */
>>>>>> +    VIR_FORCE_CLOSE(data->sock);
>>>>>> +    if (!virLastErrorIsSystemErrno(EPIPE))
>>>>>> +        virCopyLastError(&data->err);
>>>>>>      virResetLastError();
>>>>>>      VIR_FREE(buffer);
>>>>>>  }
>>>>>> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver,
>>>>>>          if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0)
>>>>>>              ret = -1;
>>>>>>      }
>>>>>> -    VIR_FORCE_CLOSE(fd);
>>>>>
>>>>> ^^^
>>>>>
>>>>> This causes Coverity to claim a RESOURCE_LEAK
>>>>>
>>>>> Feels like this was a mistake edit...
>>>>>
>>>>
>>>> The VIR_FORCE_CLOSE() inside qemuMigrationIOFunc() alone is sufficient.
>>>> Having this again here would lead to Warning in the logs. I too thought coverity
>>>> would complain. Is there a way to force ignore a coverity warning?
>>>>
>>>
>>> Typically a marker of sorts, such as
>>>
>>>     /* coverity[leaked_handle] <some extra comment explaining why> */
>>>
>>> Although I'm not sure that's the best way to handle this either.
>>>
>>> The problem I see though is this is an "error path" issue and when
>>> perhaps it's safe/required to close fd/io->sock/data->sock.
>>>
>>> Your commit comment notes that the issue is seen on a fairly specific
>>> event (virStreamSend failure) for a specific type of migration.
>>
>> I believe the failure can be seen for all types of migration with savestate
>> mismatch.
>>
>
> My thoughts were based mostly on your commit message comments:
>
> "The qemuMigrationIOFunc() notices the stream being broken from
> virStreamSend() and it cleans up the stream alone. The
> qemuMigrationWaitForCompletion() would never get to 100% transfer
> completion."
>
> and the belief that the core issue you described is there's no way for
> qemuMigrationCompleted to know the qemuMigrationIOFunc thread failed and
> thus that's what caused the hang. Since we can pass the iothread "io"
> object to the Completion function, I figured we could also use that to
> check the status of the thread in the list of things already checked. I
> guess it's not clear to me why your test fails below. Maybe it'll help
> to attach/show you my thoughts in a patch format. I just took your
> changes and "adjusted" them, merging the differences. So you could 'git
> am' to top of trunk in order to compare my thoughts.

I tried your patch. Gave migrations in a while loop. Hit the hang in 7th run
itlsef. I feel the small time window is not very negligible.

>
> I'm not opposed to the chosen method, although perhaps Dan or Jiri who
> understand more of the inner workings of migration than I do can provide
> some comments. Changing where the fd is closed could have some corner
> race condition I'm not aware of...
>
> One issue that needs to be resolved with your patch is what happens to
> the 'fd' when qemuMigrationIOFunc isn't involved?  IOW: if "fwdType ==
> MIGRATION_FWD_DIRECT", then qemuMigrationStartTunnel isn't called and
> thus qemuMigrationIOFunc won't be closing the 'fd' in the form of
> data->sock. That's perhaps a path Coverity was considering (and I hadn't
> considered initially).
>
> I think to resolve this the following should be added to your patch (I
> tried it, Coverity is happy too)...
>
> if (spec->fwdType != MIGRATION_FWD_DIRECT) {
>     if (!(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd)))
>         goto cancel;
>     /* If we've created a tunnel, then the 'fd' will be closed in the
>      * qemuMigrationIOFunc as data->sock
>      */
>     fd = -1;
> }
>
> and restore the VIR_FORCE_CLOSE(fd);
>

I agree. Let me know if we should go with this approach. I'll spin
the next version with the above corrections. I am copying Jiri on this
patch for his opinion as well.

>
>>> As I
>>> read the code, that failure jumps to error (as does virStreamFinish). So
>>> now you have a fairly specific set of instances which perhaps would
>>> cause qemuMigrationWaitForCompletion to need to fail. The fix you have
>>> proposed is to close the data->sock (io->sock, fd). However, your
>>> proposal is a larger hammer. I assume closure of data->sock causes
>>> WaitForCompletion to fail (perhaps differently) and that's why you chose it.
>>>
>>> Going back to the beginning of qemuMigrationRun, setting the 'fd' and
>>> using StartTunnel seems to rely on MIGRATION_FWD_DIRECT, but if a tunnel
>>> is created an (ugh) 'iothread' variable is NON-NULL. What if 'iothread'
>>> were passed to qemuMigrationWaitForCompletion? Then again passed to
>>> qemuMigrationCompleted which would check if iothread non-NULL and for
>>> some new flag that could be created in _qemuMigrationIOThread, being
>>> true. If it were true, then that would cause failure so that
>>> WaitForCompletion would return error status. The only way the flag is
>>> set to true is in qemuMigrationIOFunc when the code jumps to error.
>>> (e.g. add bool stream_abort and data->stream_abort = true in "error:",
>>> then check iothread && iothread->stream_abort, force error).
>>>
>>
>> I tried this. Until the fd is closed, the 'info migrate' wouldn't
>> return from the
>> qemu. So, the migration hangs. Checking the stream status before/after
>> fetching the job status would still leave a race. So, having it in a different
>> thread (qemuMigrationIOFunc) here seems inevitable to me.
>
> Although perhaps susceptible to a smaller "timing" window, if we move
> the iothread check to before the CheckJobStatus call in Completed, then
> that'll work better?
>
> It seems the "query-migrate" command has no timeout... I assume qemu
> would indicate that if you call it when the migration pipe had an error,
> then you unpredictable results (only a guess - I didn't look at that
> code as well). Perhaps that means libvirt needs some code to handle the
> condition where the command doesn't finish in a timely manner.
>

No, If the pipe was broken, the query-migrate returns "failed". This is
anyway handled in a generic way in libvirt.

 90.408 > 0x3fff80007410 {"execute":"query-migrate","id":"libvirt-725"}
 90.408 < 0x3fff80007410 {"return": {"status": "failed"}, "id": "libvirt-725"}

Thanks and Regards,
Shiva

> Perhaps worth a shot...  I did change my attached patch to reflect this
> thought..
>
> John
>>
>> #1  0x00003fff83ff593c in virCondWait (c=<optimized out>, m=<optimized out>)
>>     at util/virthread.c:154
>> #2  0x00003fff76235544 in qemuMonitorSend (mon=0x3fff54003970,
>>     msg=<optimized out>) at qemu/qemu_monitor.c:1035
>> #3  0x00003fff7624fb30 in qemuMonitorJSONCommandWithFd (mon=0x3fff54003970,
>>     cmd=0x3fff5c001420, scm_fd=-1, reply=0x3fff79fbd388)
>>     at qemu/qemu_monitor_json.c:293
>> #4  0x00003fff76254b90 in qemuMonitorJSONCommand (reply=0x3fff79fbd388,
>>     cmd=0x3fff5c001420, mon=0x3fff54003970) at qemu/qemu_monitor_json.c:323
>> #5  qemuMonitorJSONGetMigrationStatus (mon=0x3fff54003970,
>>     status=0x3fff79fbd538) at qemu/qemu_monitor_json.c:2620
>> #6  0x00003fff7623a664 in qemuMonitorGetMigrationStatus (mon=<optimized out>,
>>     status=<optimized out>) at qemu/qemu_monitor.c:2134
>> #7  0x00003fff76228e6c in qemuMigrationFetchJobStatus (driver=0x3fff6c118d80,
>>     vm=0x3fff6c27faf0, asyncJob=<optimized out>, jobInfo=0x3fff79fbd4f0)
>>     at qemu/qemu_migration.c:2528
>> #8  0x00003fff76228fb4 in qemuMigrationUpdateJobStatus (driver=0x3fff6c118d80,
>>     vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
>>     at qemu/qemu_migration.c:2565
>> #9  0x00003fff76229200 in qemuMigrationCheckJobStatus (driver=0x3fff6c118d80,
>>     vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
>> ---Type <return> to continue, or q <return> to quit---cont
>>     at qemu/qemu_migration.c:2585
>> #10 0x00003fff76229408 in qemuMigrationCompleted (iothread=<optimized out>,
>>
>>> The only question then in my mind then is would this also be something
>>> that should be done for the virStreamFinish failure path which also
>>> jumps to error?
>>>
>>
>> Yes, I too am not sure if its appropriate for virStreamFinish failure
>> case. But, given
>> this an error path, I feel we can safely go ahead and close the
>> data->sock.
>>
>>> Doing this means you shouldn't need the VIR_FILE_CLOSE(data->sock) for
>>> either the normal or error path.  Does this seem to be a reasonable
>>> approach and solve the issue you're facing?
>>>
>>> John
>>>> Thanks and Regards,
>>>> Shivaprasad
>>>>
>>>>> The rest of the patch looks reasonable; however, I'm far from the expert
>>>>> in these matters.
>>>>>
>>>>> John
>>>>>>
>>>>>>      if (priv->job.completed) {
>>>>>>          qemuDomainJobInfoUpdateTime(priv->job.completed);
>>>>>>
>>>>>> --
>>>>>> libvir-list mailing list
>>>>>> libvir-list at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/libvir-list
>>>>>>