[libvirt] [PATCH] qemu_agent: Issue guest-sync prior to every command

Michal Privoznik mprivozn at redhat.com
Tue Feb 7 15:49:55 UTC 2012


On 07.02.2012 16:18, Daniel P. Berrange wrote:
> On Wed, Feb 01, 2012 at 06:04:45PM +0100, Michal Privoznik wrote:
>> If we issue guest command and GA is not running, the issuing thread
>> will block endlessly. We can check for GA presence by issuing
>> guest-sync with unique ID (timestamp). We don't want to issue real
>> command as even if GA is not running, once it is started, it process
>> all commands written to GA socket.
>>
>> If we receive reply, it might be from a previous sync command, so
>> we need to compare received ID for equality with the given one.
>> ---
>>
>> Some background on this:
>> -GA socked is always writeable, even with no GA running (!)
>>
>> -writing something to dead socket (=without any GA listening) will thus
>>  succeed. However, when one will (after a period of time) start the GA,
>>  it will read and process all written data. This makes timeout
>>  implementation harder.
>>
>> -Therefore issuing a GA command without any GA running will block indefinitely.
>>
>> My solution to this is: Issue harmless guest-sync command which returns
>> given identifier. To avoid blocking, do a timed wait (5 seconds).
>> This is the only case where we do a timed wait, regular GA commands
>> will of course wait without any timeout.
>> If GA didn't reply within those 5 seconds, we store an ID so we can compare
>> it later and report 'GA unavailable'; Note that guest won't get into
>> a different state as libvirt thinks it is in.
>> Then, if GA is ever started, it start to process our sync commands.
>> And all we have to do is check if we have ever issued such ID.
>> If not an error is reported.
>>
>> However, there is still race:
>> 1) GA will reply to our guest-sync
>> 2) GA gets terminated
>> 3) we issue regular command (e.g. fs-freeze)
>>
>> I am not sure this is avoidable. But this patch makes current situation bearable.
> 
> We could still timeout the 'fs-freeze' command after 30 seconds
> or so. Given that we issue the guest-resync command, we'll be
> able to automatically re-sync the JSON protocol by dropping the
> later arriving fs-freeze reply (if any).

I don't think this is a good idea. I've chosen 'fs-freeze' intentionally
:) It's something that actually might take ages - to sync disks (which
is what current implementation does). Therefore if we set any timeout
for regular commands we may get into inconsistent state:

1) issue fs-freeze
2) timeout and return error (everybody thinks fs is not frozen)
3) receive "okay, frozen" from GA

> 
>>
>> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
>> index 9df5546..5ecf893 100644
>> --- a/src/qemu/qemu_agent.c
>> +++ b/src/qemu/qemu_agent.c
>> @@ -39,6 +39,7 @@
>>  #include "virterror_internal.h"
>>  #include "json.h"
>>  #include "virfile.h"
>> +#include "virtime.h"
>>  
>>  #define VIR_FROM_THIS VIR_FROM_QEMU
>>  
>> @@ -106,6 +107,10 @@ struct _qemuAgent {
>>      /* If anything went wrong, this will be fed back
>>       * the next monitor msg */
>>      virError lastError;
>> +
>> +    /* Array of IDs issued to GA via guest-sync */
>> +    unsigned long long *ids;
>> +    size_t ids_size;
>>  };
> 
> I'm not sure that we want / need to store all historic IDs.
> At any time, aren't we only interested in the most recent ID
> that we have sent.
> 
> We always serialize commands we sent to the agent, so consider
> we sent
> 
>    guest-sync:  id=1
>       ...times out...
>    guest-sync:  id=2
>     fs-freeze:
> 
> After that first time out we see, we need to discard all
> further data received from the agent until we see a guest-sync
> command reply with id=2.  If we see a guest-snc reply with
> id=1, we don't care - we just drop it & carry on waiting
> until we get one with id=2.

Makes sense.

> 
>> +/**
>> + * qemuAgentGuestSync:
>> + * @mon: Monitor
>> + *
>> + * Send guest-sync with unique ID and wait for
>> + * reply. If we get one, check if received ID
>> + * is equal to given.
>> + *
>> + * Returns: 0 on success,
>> + *          -1 otherwise
>> + */
>> +static int
>> +qemuAgentGuestSync(qemuAgentPtr mon)
>> +{
>> +    int ret = -1;
>> +    int send_ret;
>> +    unsigned long long id, id_ret;
>> +    qemuAgentMessage sync_msg;
>> +
>> +    memset(&sync_msg, 0, sizeof sync_msg);
>> +
>> +    if (virTimeMillisNow(&id) < 0)
>> +        return -1;
>> +
>> +    if (virAsprintf(&sync_msg.txBuffer,
>> +                    "{\"execute\":\"guest-sync\", "
>> +                    "\"arguments\":{\"id\":%llu}}", id) < 0) {
>> +        virReportOOMError();
>> +        return -1;
>> +    }
>> +    sync_msg.txLength = strlen(sync_msg.txBuffer);
>> +
>> +    VIR_DEBUG("Sending guest-sync command with ID: %llu", id);
> 
> According to the 'guest-sync' QMP spec, we need to send the magic byte
> '0xFF' immediately before the guest-sync command data is sent.

Yeah, and probably switch to new guest-sync-delimited command as soon as
it's upstream.
> 
> Regards,
> Daniel




More information about the libvir-list mailing list