[libvirt] Fwd: [PATCH v2] Close the source fd if the destination qemu exits during tunnelled migration

Wed Sep 30 05:58:27 UTC 2015

On Tue, Sep 29, 2015 at 2:33 AM, John Ferlan <jferlan at redhat.com> wrote:
>
>
> On 09/28/2015 11:02 AM, Shivaprasad bhat wrote:
>> Hi Jon,
>>
>> Thanks a lot for attaching the patch.  Replies inline.
>>
>> On Thu, Sep 24, 2015 at 1:52 AM, John Ferlan <jferlan at redhat.com> wrote:
>>>
>>>
>>> On 09/22/2015 07:21 AM, Shivaprasad bhat wrote:
>>>> On Mon, Sep 21, 2015 at 8:04 PM, John Ferlan <jferlan at redhat.com> wrote:
>>>>>
>>>>>
>>>>> On 09/21/2015 05:09 AM, Shivaprasad bhat wrote:
>>>>>> Thanks John for the comments.
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 18, 2015 at 10:34 PM, John Ferlan <jferlan at redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote:
>>>>>>>> Tunnelled migration can hang if the destination qemu exits despite all the
>>>>>>>> ABI checks. This happens whenever the destination qemu exits before the
>>>>>>>> complete transfer is noticed by source qemu. The savevm state checks at
>>>>>>>> runtime can fail at destination and cause qemu to error out.
>>>>>>>> The source qemu cant notice it as the EPIPE is not propogated to it.
>>>>>>>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
>>>>>>>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
>>>>>>>> never get to 100% transfer completion.
>>>>>>>> The qemuMigrationWaitForCompletion() never breaks out as well since
>>>>>>>> the ssh connection to destination is healthy, and the source qemu also thinks
>>>>>>>> the migration is ongoing as the Fd to which it transfers, is never
>>>>>>>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
>>>>>>>> virsh migrate wouldn't be honoured. Close the source side FD when there is
>>>>>>>> an error in the stream. That way, the source qemu updates itself and
>>>>>>>> qemuMigrationWaitForCompletion() notices the failure.
>>>>>>>>
>>>>>>>> Close the FD for all kinds of errors to be sure. The error message is not
>>>>>>>> copied for EPIPE so that the destination error is copied instead later.
>>>>>>>>
>>>>>>>> Note:
>>>>>>>> Reproducible with repeated migrations between Power hosts running in different
>>>>>>>> subcores-per-core modes.
>>>>>>>>
>>>>>>>> Changes from v1 -> v2:
>>>>>>>>    VIR_FORCE_CLOSE() was called twice for this use case which would log
>>>>>>>>    unneccessary warnings. So, move the fd close to qemuMigrationIOFunc
>>>>>>>>    so that there are no unnecessary duplicate attempts.(Would this trigger
>>>>>>>>    a Coverity error? I don't have a setup to check.)
>>>>>>>>
>>>>>>>> Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>  src/qemu/qemu_migration.c |    8 ++++++--
>>>>>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>>>>>>> index ff89ab5..9602fb2 100644
>>>>>>>> --- a/src/qemu/qemu_migration.c
>>>>>>>> +++ b/src/qemu/qemu_migration.c
>>>>>>>> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg)
>>>>>>>>      if (virStreamFinish(data->st) < 0)
>>>>>>>>          goto error;
>>>>>>>>
>>>>>>>> +    VIR_FORCE_CLOSE(data->sock);
>>>>>>>>      VIR_FREE(buffer);
>>>>>>>>
>>>>>>>>      return;
>>>>>>>> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg)
>>>>>>>>      }
>>>>>>>>
>>>>>>>>   error:
>>>>>>>> -    virCopyLastError(&data->err);
>>>>>>>> +    /* Let the source qemu know that the transfer cant continue anymore.
>>>>>>>> +     * Don't copy the error for EPIPE as destination has the actual error. */
>>>>>>>> +    VIR_FORCE_CLOSE(data->sock);
>>>>>>>> +    if (!virLastErrorIsSystemErrno(EPIPE))
>>>>>>>> +        virCopyLastError(&data->err);
>>>>>>>>      virResetLastError();
>>>>>>>>      VIR_FREE(buffer);
>>>>>>>>  }
>>>>>>>> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver,
>>>>>>>>          if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0)
>>>>>>>>              ret = -1;
>>>>>>>>      }
>>>>>>>> -    VIR_FORCE_CLOSE(fd);
>>>>>>>
>>>>>>> ^^^
>>>>>>>
>>>>>>> This causes Coverity to claim a RESOURCE_LEAK
>>>>>>>
>>>>>>> Feels like this was a mistake edit...
>>>>>>>
>>>>>>
>>>>>> The VIR_FORCE_CLOSE() inside qemuMigrationIOFunc() alone is sufficient.
>>>>>> Having this again here would lead to Warning in the logs. I too thought coverity
>>>>>> would complain. Is there a way to force ignore a coverity warning?
>>>>>>
>>>>>
>>>>> Typically a marker of sorts, such as
>>>>>
>>>>>     /* coverity[leaked_handle] <some extra comment explaining why> */
>>>>>
>>>>> Although I'm not sure that's the best way to handle this either.
>>>>>
>>>>> The problem I see though is this is an "error path" issue and when
>>>>> perhaps it's safe/required to close fd/io->sock/data->sock.
>>>>>
>>>>> Your commit comment notes that the issue is seen on a fairly specific
>>>>> event (virStreamSend failure) for a specific type of migration.
>>>>
>>>> I believe the failure can be seen for all types of migration with savestate
>>>> mismatch.
>>>>
>>>
>>> My thoughts were based mostly on your commit message comments:
>>>
>>> "The qemuMigrationIOFunc() notices the stream being broken from
>>> virStreamSend() and it cleans up the stream alone. The
>>> qemuMigrationWaitForCompletion() would never get to 100% transfer
>>> completion."
>>>
>>> and the belief that the core issue you described is there's no way for
>>> qemuMigrationCompleted to know the qemuMigrationIOFunc thread failed and
>>> thus that's what caused the hang. Since we can pass the iothread "io"
>>> object to the Completion function, I figured we could also use that to
>>> check the status of the thread in the list of things already checked. I
>>> guess it's not clear to me why your test fails below. Maybe it'll help
>>> to attach/show you my thoughts in a patch format. I just took your
>>> changes and "adjusted" them, merging the differences. So you could 'git
>>> am' to top of trunk in order to compare my thoughts.
>>
>> I tried your patch. Gave migrations in a while loop. Hit the hang in 7th run
>> itlsef. I feel the small time window is not very negligible.
>>
>
> Perhaps I should clarify my query-migrate has no timeout comment... It
> seems based on what I've read so far, the 'query-migrate' command
> started successfully, because if it hadn't we would have received a
> failure (as shown below). Thus libvirt has sent the command via the
> monitor and is waiting for a response (e.g. the virCondWait after the
> qemuMonitorSend in the trace below). The response isn't coming because
> either "A" qemu didn't send it back or "B" libvirt missed it - that
> should be determinable.
>
> There's a way to turn on debugging so the monitor dialog can be seen -
> via changes to /etc/libvirt/libvirtd.conf. I use :
>
> log_level = 1
> log_filters="3:remote 4:event 3:json 3:rpc"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>
> But you may need to remove the "3:json" in order to see the dialog since
> that where it "feels like" the issue might be. Then start libvirtd in
> the debugger again. Once it's hung - you should be able to scan (eg,
> edit) the libvirtd.log file and search for the "query-migrate" command
> being sent and then follow the copious output looking for the presence
> of a returned command.  If there is none, then something in qemu isn't
> returning the failure correctly and it would need to be fixed there I
> would think as opposed to throwing down the big hammer of closing the fd.

Had a chance to run with your log settings. The query-migrate doesn't seem to
have a corresponding "return" in the logs. So as you say, there may be
a qemu bug
that is not returning a response when the fd is still open(as libvirt
didnt close it) but
no read actually happening there. I felt qemu can't sense the failure as the fd
is open, so posted this patch. Though, the qemu should return with the
current state
of migration as it sees instead of not returning at all. Hope we are
on the same page.

Mailer refused my reply as the attached logs with the mail crossed 300KB.
Not attaching the logs for now. Please let me know if you want to me check
anything specific.

Thanks,
Shivaprasad

>
> I think you've hit some edge/error "timing" condition - one of the more
> difficult to debug. Finding the root cause of the problem could be
> beneficial especially if it could be considered some more "common" type
> problem for other similar migrate commands.
>
> John
>>>
>>> I'm not opposed to the chosen method, although perhaps Dan or Jiri who
>>> understand more of the inner workings of migration than I do can provide
>>> some comments. Changing where the fd is closed could have some corner
>>> race condition I'm not aware of...
>>>
>>> One issue that needs to be resolved with your patch is what happens to
>>> the 'fd' when qemuMigrationIOFunc isn't involved?  IOW: if "fwdType ==
>>> MIGRATION_FWD_DIRECT", then qemuMigrationStartTunnel isn't called and
>>> thus qemuMigrationIOFunc won't be closing the 'fd' in the form of
>>> data->sock. That's perhaps a path Coverity was considering (and I hadn't
>>> considered initially).
>>>
>>> I think to resolve this the following should be added to your patch (I
>>> tried it, Coverity is happy too)...
>>>
>>> if (spec->fwdType != MIGRATION_FWD_DIRECT) {
>>>     if (!(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd)))
>>>         goto cancel;
>>>     /* If we've created a tunnel, then the 'fd' will be closed in the
>>>      * qemuMigrationIOFunc as data->sock
>>>      */
>>>     fd = -1;
>>> }
>>>
>>> and restore the VIR_FORCE_CLOSE(fd);
>>>
>>
>> I agree. Let me know if we should go with this approach. I'll spin
>> the next version with the above corrections. I am copying Jiri on this
>> patch for his opinion as well.
>>
>>>
>>>>> As I
>>>>> read the code, that failure jumps to error (as does virStreamFinish). So
>>>>> now you have a fairly specific set of instances which perhaps would
>>>>> cause qemuMigrationWaitForCompletion to need to fail. The fix you have
>>>>> proposed is to close the data->sock (io->sock, fd). However, your
>>>>> proposal is a larger hammer. I assume closure of data->sock causes
>>>>> WaitForCompletion to fail (perhaps differently) and that's why you chose it.
>>>>>
>>>>> Going back to the beginning of qemuMigrationRun, setting the 'fd' and
>>>>> using StartTunnel seems to rely on MIGRATION_FWD_DIRECT, but if a tunnel
>>>>> is created an (ugh) 'iothread' variable is NON-NULL. What if 'iothread'
>>>>> were passed to qemuMigrationWaitForCompletion? Then again passed to
>>>>> qemuMigrationCompleted which would check if iothread non-NULL and for
>>>>> some new flag that could be created in _qemuMigrationIOThread, being
>>>>> true. If it were true, then that would cause failure so that
>>>>> WaitForCompletion would return error status. The only way the flag is
>>>>> set to true is in qemuMigrationIOFunc when the code jumps to error.
>>>>> (e.g. add bool stream_abort and data->stream_abort = true in "error:",
>>>>> then check iothread && iothread->stream_abort, force error).
>>>>>
>>>>
>>>> I tried this. Until the fd is closed, the 'info migrate' wouldn't
>>>> return from the
>>>> qemu. So, the migration hangs. Checking the stream status before/after
>>>> fetching the job status would still leave a race. So, having it in a different
>>>> thread (qemuMigrationIOFunc) here seems inevitable to me.
>>>
>>> Although perhaps susceptible to a smaller "timing" window, if we move
>>> the iothread check to before the CheckJobStatus call in Completed, then
>>> that'll work better?
>>>
>>> It seems the "query-migrate" command has no timeout... I assume qemu
>>> would indicate that if you call it when the migration pipe had an error,
>>> then you unpredictable results (only a guess - I didn't look at that
>>> code as well). Perhaps that means libvirt needs some code to handle the
>>> condition where the command doesn't finish in a timely manner.
>>>
>>
>> No, If the pipe was broken, the query-migrate returns "failed". This is
>> anyway handled in a generic way in libvirt.
>>
>>  90.408 > 0x3fff80007410 {"execute":"query-migrate","id":"libvirt-725"}
>>  90.408 < 0x3fff80007410 {"return": {"status": "failed"}, "id": "libvirt-725"}
>>
>> Thanks and Regards,
>> Shiva
>>
>>> Perhaps worth a shot...  I did change my attached patch to reflect this
>>> thought..
>>>
>>> John
>>>>
>>>> #1  0x00003fff83ff593c in virCondWait (c=<optimized out>, m=<optimized out>)
>>>>     at util/virthread.c:154
>>>> #2  0x00003fff76235544 in qemuMonitorSend (mon=0x3fff54003970,
>>>>     msg=<optimized out>) at qemu/qemu_monitor.c:1035
>>>> #3  0x00003fff7624fb30 in qemuMonitorJSONCommandWithFd (mon=0x3fff54003970,
>>>>     cmd=0x3fff5c001420, scm_fd=-1, reply=0x3fff79fbd388)
>>>>     at qemu/qemu_monitor_json.c:293
>>>> #4  0x00003fff76254b90 in qemuMonitorJSONCommand (reply=0x3fff79fbd388,
>>>>     cmd=0x3fff5c001420, mon=0x3fff54003970) at qemu/qemu_monitor_json.c:323
>>>> #5  qemuMonitorJSONGetMigrationStatus (mon=0x3fff54003970,
>>>>     status=0x3fff79fbd538) at qemu/qemu_monitor_json.c:2620
>>>> #6  0x00003fff7623a664 in qemuMonitorGetMigrationStatus (mon=<optimized out>,
>>>>     status=<optimized out>) at qemu/qemu_monitor.c:2134
>>>> #7  0x00003fff76228e6c in qemuMigrationFetchJobStatus (driver=0x3fff6c118d80,
>>>>     vm=0x3fff6c27faf0, asyncJob=<optimized out>, jobInfo=0x3fff79fbd4f0)
>>>>     at qemu/qemu_migration.c:2528
>>>> #8  0x00003fff76228fb4 in qemuMigrationUpdateJobStatus (driver=0x3fff6c118d80,
>>>>     vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
>>>>     at qemu/qemu_migration.c:2565
>>>> #9  0x00003fff76229200 in qemuMigrationCheckJobStatus (driver=0x3fff6c118d80,
>>>>     vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
>>>> ---Type <return> to continue, or q <return> to quit---cont
>>>>     at qemu/qemu_migration.c:2585
>>>> #10 0x00003fff76229408 in qemuMigrationCompleted (iothread=<optimized out>,
>>>>
>>>>> The only question then in my mind then is would this also be something
>>>>> that should be done for the virStreamFinish failure path which also
>>>>> jumps to error?
>>>>>
>>>>
>>>> Yes, I too am not sure if its appropriate for virStreamFinish failure
>>>> case. But, given
>>>> this an error path, I feel we can safely go ahead and close the
>>>> data->sock.
>>>>
>>>>> Doing this means you shouldn't need the VIR_FILE_CLOSE(data->sock) for
>>>>> either the normal or error path.  Does this seem to be a reasonable
>>>>> approach and solve the issue you're facing?
>>>>>
>>>>> John
>>>>>> Thanks and Regards,
>>>>>> Shivaprasad
>>>>>>
>>>>>>> The rest of the patch looks reasonable; however, I'm far from the expert
>>>>>>> in these matters.
>>>>>>>
>>>>>>> John
>>>>>>>>
>>>>>>>>      if (priv->job.completed) {
>>>>>>>>          qemuDomainJobInfoUpdateTime(priv->job.completed);
>>>>>>>>
>>>>>>>> --
>>>>>>>> libvir-list mailing list
>>>>>>>> libvir-list at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/libvir-list
>>>>>>>>