[libvirt] [PATCH] qemu_agent: Issue guest-sync prior to every command
Michal Privoznik
mprivozn at redhat.com
Tue Feb 7 15:49:55 UTC 2012
On 07.02.2012 16:18, Daniel P. Berrange wrote:
> On Wed, Feb 01, 2012 at 06:04:45PM +0100, Michal Privoznik wrote:
>> If we issue guest command and GA is not running, the issuing thread
>> will block endlessly. We can check for GA presence by issuing
>> guest-sync with unique ID (timestamp). We don't want to issue real
>> command as even if GA is not running, once it is started, it process
>> all commands written to GA socket.
>>
>> If we receive reply, it might be from a previous sync command, so
>> we need to compare received ID for equality with the given one.
>> ---
>>
>> Some background on this:
>> -GA socked is always writeable, even with no GA running (!)
>>
>> -writing something to dead socket (=without any GA listening) will thus
>> succeed. However, when one will (after a period of time) start the GA,
>> it will read and process all written data. This makes timeout
>> implementation harder.
>>
>> -Therefore issuing a GA command without any GA running will block indefinitely.
>>
>> My solution to this is: Issue harmless guest-sync command which returns
>> given identifier. To avoid blocking, do a timed wait (5 seconds).
>> This is the only case where we do a timed wait, regular GA commands
>> will of course wait without any timeout.
>> If GA didn't reply within those 5 seconds, we store an ID so we can compare
>> it later and report 'GA unavailable'; Note that guest won't get into
>> a different state as libvirt thinks it is in.
>> Then, if GA is ever started, it start to process our sync commands.
>> And all we have to do is check if we have ever issued such ID.
>> If not an error is reported.
>>
>> However, there is still race:
>> 1) GA will reply to our guest-sync
>> 2) GA gets terminated
>> 3) we issue regular command (e.g. fs-freeze)
>>
>> I am not sure this is avoidable. But this patch makes current situation bearable.
>
> We could still timeout the 'fs-freeze' command after 30 seconds
> or so. Given that we issue the guest-resync command, we'll be
> able to automatically re-sync the JSON protocol by dropping the
> later arriving fs-freeze reply (if any).
I don't think this is a good idea. I've chosen 'fs-freeze' intentionally
:) It's something that actually might take ages - to sync disks (which
is what current implementation does). Therefore if we set any timeout
for regular commands we may get into inconsistent state:
1) issue fs-freeze
2) timeout and return error (everybody thinks fs is not frozen)
3) receive "okay, frozen" from GA
>
>>
>> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
>> index 9df5546..5ecf893 100644
>> --- a/src/qemu/qemu_agent.c
>> +++ b/src/qemu/qemu_agent.c
>> @@ -39,6 +39,7 @@
>> #include "virterror_internal.h"
>> #include "json.h"
>> #include "virfile.h"
>> +#include "virtime.h"
>>
>> #define VIR_FROM_THIS VIR_FROM_QEMU
>>
>> @@ -106,6 +107,10 @@ struct _qemuAgent {
>> /* If anything went wrong, this will be fed back
>> * the next monitor msg */
>> virError lastError;
>> +
>> + /* Array of IDs issued to GA via guest-sync */
>> + unsigned long long *ids;
>> + size_t ids_size;
>> };
>
> I'm not sure that we want / need to store all historic IDs.
> At any time, aren't we only interested in the most recent ID
> that we have sent.
>
> We always serialize commands we sent to the agent, so consider
> we sent
>
> guest-sync: id=1
> ...times out...
> guest-sync: id=2
> fs-freeze:
>
> After that first time out we see, we need to discard all
> further data received from the agent until we see a guest-sync
> command reply with id=2. If we see a guest-snc reply with
> id=1, we don't care - we just drop it & carry on waiting
> until we get one with id=2.
Makes sense.
>
>> +/**
>> + * qemuAgentGuestSync:
>> + * @mon: Monitor
>> + *
>> + * Send guest-sync with unique ID and wait for
>> + * reply. If we get one, check if received ID
>> + * is equal to given.
>> + *
>> + * Returns: 0 on success,
>> + * -1 otherwise
>> + */
>> +static int
>> +qemuAgentGuestSync(qemuAgentPtr mon)
>> +{
>> + int ret = -1;
>> + int send_ret;
>> + unsigned long long id, id_ret;
>> + qemuAgentMessage sync_msg;
>> +
>> + memset(&sync_msg, 0, sizeof sync_msg);
>> +
>> + if (virTimeMillisNow(&id) < 0)
>> + return -1;
>> +
>> + if (virAsprintf(&sync_msg.txBuffer,
>> + "{\"execute\":\"guest-sync\", "
>> + "\"arguments\":{\"id\":%llu}}", id) < 0) {
>> + virReportOOMError();
>> + return -1;
>> + }
>> + sync_msg.txLength = strlen(sync_msg.txBuffer);
>> +
>> + VIR_DEBUG("Sending guest-sync command with ID: %llu", id);
>
> According to the 'guest-sync' QMP spec, we need to send the magic byte
> '0xFF' immediately before the guest-sync command data is sent.
Yeah, and probably switch to new guest-sync-delimited command as soon as
it's upstream.
>
> Regards,
> Daniel
More information about the libvir-list
mailing list