[libvirt] [PATCH v3] Close the source fd if the destination qemu exits during tunnelled migration
shivaprasadbhat at gmail.com
Tue Oct 13 11:21:33 UTC 2015
On Mon, Oct 12, 2015 at 8:03 PM, Jiri Denemark <jdenemar at redhat.com> wrote:
> On Thu, Oct 08, 2015 at 17:59:07 +0530, Shivaprasad G Bhat wrote:
>> Tunnelled migration can hang if the destination qemu exits despite all the
>> ABI checks. This happens whenever the destination qemu exits before the
>> complete transfer is noticed by source qemu. The savevm state checks at
>> runtime can fail at destination and cause qemu to error out.
>> The source qemu cant notice it as the EPIPE is not propogated to it.
>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
>> never get to 100% transfer completion.
>> The qemuMigrationWaitForCompletion() never breaks out as well since
>> the ssh connection to destination is healthy, and the source qemu also thinks
>> the migration is ongoing as the Fd to which it transfers, is never
>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
>> virsh migrate wouldn't be honoured. Close the source side FD when there is
>> an error in the stream. That way, the source qemu updates itself and
>> qemuMigrationWaitForCompletion() notices the failure.
>> Close the FD for all kinds of errors to be sure. The error message is not
>> copied for EPIPE so that the destination error is copied instead later.
>> Reproducible with repeated migrations between Power hosts running in different
>> subcores-per-core modes.
>> Changes from v2 -> v3:
>> VIR_FORCE_CLOSE() needn't be called on the qemu source fd as that is closed
>> by qemuMigrationIOFunc() for tunnelled migration. Set the fd to -1.
>> Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>> src/qemu/qemu_migration.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 7440108..c982838 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -4019,6 +4019,7 @@ static void qemuMigrationIOFunc(void *arg)
>> if (virStreamFinish(data->st) < 0)
>> goto error;
>> + VIR_FORCE_CLOSE(data->sock);
>> @@ -4036,7 +4037,11 @@ static void qemuMigrationIOFunc(void *arg)
>> - virCopyLastError(&data->err);
>> + /* Let the source qemu know that the transfer cant continue anymore.
>> + * Don't copy the error for EPIPE as destination has the actual error. */
>> + VIR_FORCE_CLOSE(data->sock);
>> + if (!virLastErrorIsSystemErrno(EPIPE))
>> + virCopyLastError(&data->err);
> Hmm, I don't think we need to filter out any error here. Were you
> actually able to get the EPIPE error except for seeing it in the logs?
> What I'm trying to say is: if we get an EPIPE in qemuMigrationIOFunc, we
> close data->sock which will propagate to the source QEMU and
> qemuMigrationWaitForCompletion should return a "migration unexpectedly
> failed" error, which will get stored in orig_err and even though
> qemuMigrationStopTunnel will set the thread local error to the one from
> iothread->data, we will reset it back to the orig_err one. Only in case
> qemuMigrationWaitForCompletion returned success but we still got error
> on the tunnel, we'd actually see the error from iothread->err.
Yes. The qemuMigrationWaitForCompletion() returns success sometimes and
StopTunnel propogating the failure in such a case. This happens inconsistently,
but I do see it happening. Please let me know what I should be doing
if this is not the
> Otherwise the patch looks fine.
More information about the libvir-list