[libvirt] [PATCH v3] Close the source fd if the destination qemu exits during tunnelled migration

Shivaprasad bhat shivaprasadbhat at gmail.com
Thu Oct 15 08:32:24 UTC 2015


On Tue, Oct 13, 2015 at 4:51 PM, Shivaprasad bhat
<shivaprasadbhat at gmail.com> wrote:
> On Mon, Oct 12, 2015 at 8:03 PM, Jiri Denemark <jdenemar at redhat.com> wrote:
>> On Thu, Oct 08, 2015 at 17:59:07 +0530, Shivaprasad G Bhat wrote:
>>> Tunnelled migration can hang if the destination qemu exits despite all the
>>> ABI checks. This happens whenever the destination qemu exits before the
>>> complete transfer is noticed by source qemu. The savevm state checks at
>>> runtime can fail at destination and cause qemu to error out.
>>> The source qemu cant notice it as the EPIPE is not propogated to it.
>>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
>>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
>>> never get to 100% transfer completion.
>>> The qemuMigrationWaitForCompletion() never breaks out as well since
>>> the ssh connection to destination is healthy, and the source qemu also thinks
>>> the migration is ongoing as the Fd to which it transfers, is never
>>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
>>> virsh migrate wouldn't be honoured. Close the source side FD when there is
>>> an error in the stream. That way, the source qemu updates itself and
>>> qemuMigrationWaitForCompletion() notices the failure.
>>>
>>> Close the FD for all kinds of errors to be sure. The error message is not
>>> copied for EPIPE so that the destination error is copied instead later.
>>>
>>> Note:
>>> Reproducible with repeated migrations between Power hosts running in different
>>> subcores-per-core modes.
>>>
>>> Changes from v2 -> v3:
>>>    VIR_FORCE_CLOSE() needn't be called on the qemu source fd as that is closed
>>>    by qemuMigrationIOFunc() for tunnelled migration. Set the fd to -1.
>>>
>>> Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>>> ---
>>>  src/qemu/qemu_migration.c |   18 ++++++++++++++----
>>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>> index 7440108..c982838 100644
>>> --- a/src/qemu/qemu_migration.c
>>> +++ b/src/qemu/qemu_migration.c
>>> @@ -4019,6 +4019,7 @@ static void qemuMigrationIOFunc(void *arg)
>>>      if (virStreamFinish(data->st) < 0)
>>>          goto error;
>>>
>>> +    VIR_FORCE_CLOSE(data->sock);
>>>      VIR_FREE(buffer);
>>>
>>>      return;
>>> @@ -4036,7 +4037,11 @@ static void qemuMigrationIOFunc(void *arg)
>>>      }
>>>
>>>   error:
>>> -    virCopyLastError(&data->err);
>>> +    /* Let the source qemu know that the transfer cant continue anymore.
>>> +     * Don't copy the error for EPIPE as destination has the actual error. */
>>> +    VIR_FORCE_CLOSE(data->sock);
>>> +    if (!virLastErrorIsSystemErrno(EPIPE))
>>> +        virCopyLastError(&data->err);
>>>      virResetLastError();
>>>      VIR_FREE(buffer);
>>>  }
>>
>> Hmm, I don't think we need to filter out any error here. Were you
>> actually able to get the EPIPE error except for seeing it in the logs?
>> What I'm trying to say is: if we get an EPIPE in qemuMigrationIOFunc, we
>> close data->sock which will propagate to the source QEMU and
>> qemuMigrationWaitForCompletion should return a "migration unexpectedly
>> failed" error, which will get stored in orig_err and even though
>> qemuMigrationStopTunnel will set the thread local error to the one from
>> iothread->data, we will reset it back to the orig_err one. Only in case
>> qemuMigrationWaitForCompletion returned success but we still got error
>> on the tunnel, we'd actually see the error from iothread->err.
>>
>
> Yes. The qemuMigrationWaitForCompletion() returns success sometimes and
> StopTunnel propogating the failure in such a case. This happens inconsistently,
> but I do see it happening. Please let me know what I should be doing
> if this is not the
> way.
>

Hi Jirka,

As you mentioned over IRC, I checked what would be the result of
migration if the
destination has no error message. I modified the qemu to not to emit
any error messages.

The error in such case is "internal error: End of file from monitor"
which of course is coming
from libvirtd is as good as "cannot write to stream: Broken pipe".
Both anyway convey very less
as to what to do.

So, I think ignoring EPIPE is the way to go. Please let me know your opinion.

Thanks,
Shiva

> Thanks,
> Shivaprasad
>
>> Otherwise the patch looks fine.
>>
>> Jirka




More information about the libvir-list mailing list