[libvirt] [PATCH 2/4] qemu: Introduce APIs for manipulating qemuDomainAgentJob

Thu Jun 14 16:20:31 UTC 2018

On 06/14/2018 10:32 AM, Michal Privoznik wrote:
> On 06/14/2018 12:10 AM, John Ferlan wrote:
>>
>>
>> On 06/08/2018 09:45 AM, Michal Privoznik wrote:
>>> The point is to break QEMU_JOB_* into smaller pieces which
>>> enables us to achieve higher throughput. For instance, if there
>>> are two threads, one is trying to query something on qemu
>>> monitor while the other is trying to query something on agent
>>> monitor these two threads would serialize. There is not much
>>> reason for that.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>>  src/qemu/THREADS.txt   |  62 +++++++++++++++++--
>>>  src/qemu/qemu_domain.c | 163 +++++++++++++++++++++++++++++++++++++++----------
>>>  src/qemu/qemu_domain.h |  12 ++++
>>>  3 files changed, 200 insertions(+), 37 deletions(-)
>>>
>>
>> Looked at THREADS.txt before reading any code in order to get a sense
>> for what's being done... I won't go back after reading the code so that
>> you get a sense of how someone not knowing what's going on views things.
>> So when reading comments here - just take them with that in mind. I have
>> a feeling my mind/thoughts will change later on ;-).
>>
>>> diff --git a/src/qemu/THREADS.txt b/src/qemu/THREADS.txt
>>> index 7243161fe3..6219f46a81 100644
>>> --- a/src/qemu/THREADS.txt
>>> +++ b/src/qemu/THREADS.txt
>>> @@ -51,8 +51,8 @@ There are a number of locks on various objects
>>>      Since virDomainObjPtr lock must not be held during sleeps, the job
>>>      conditions provide additional protection for code making updates.
>>>  
>>> -    QEMU driver uses two kinds of job conditions: asynchronous and
>>> -    normal.
>>> +    QEMU driver uses three kinds of job conditions: asynchronous, agent
>>> +    and normal.
>>>  
>>>      Asynchronous job condition is used for long running jobs (such as
>>>      migration) that consist of several monitor commands and it is
>>> @@ -69,9 +69,15 @@ There are a number of locks on various objects
>>>      it needs to wait until the asynchronous job ends and try to acquire
>>>      the job again.
>>>  
>>> +    Agent job condition is then used when thread wishes to talk to qemu
>>
>> s/then//
>> s/when thread/when the thread/
>> s/to qemu/to the qemu/
>>
>>> +    agent monitor. It is possible to acquire just agent job
>>> +    (qemuDomainObjBeginAgentJob), or only normal job
>>> +    (qemuDomainObjBeginJob) or both at the same time
>>> +    (qemuDomainObjBeginJobWithAgent).
>>> +
>>
>> Oh this seems like it's going to be tricky...  How does one choose which
>> to use. 
> 
> By reading further. There are examples which make it clear.

Not sure I 100% agree with that statement, but will point out that the
descriptions for async and normal don't list API names.

I get the sense for why to choose async job and that a normal job
encompasses the rest. What an agent job is and why it's selected - it's
less clear, but perhaps the following helps alters that...

An agent job condition is used for any command that uses the domain
guest agent to query or modify for supported guest agent options. Using
this job condition allows for one thread to query/modify the agent while
another normal thread can do the same for the non guest agent related
data. For certain guest agent commands (shutdown, reboot, and set time),
a normal job cannot be running thus those APIs will wait for any normal
thread to be done before running.

 >
> At least w/ async you know it's a long running job and normal is
>> everything else.  
> 
> Not really. How long must a job be to be considered long running? You
> (as reader) don't know at this point so you carry on reading and
> examples make it all clear.
> 
> But okay, I can try to expand the description.
> 
> Now you have category 3 agent specific - at this point
>> I'm not sure you really want to mix normal and agent. If the purpose of
>> the series is to extricate agent job from normal job, then mixing when
>> it's convenient leads to speculation of what misunderstandings will
>> occur for future consumers.
> 
> Sometimes an API accesses both monitor AND agent. In this case it needs
> to grab both types of jobs.
> 

right remember the context - I read THREADS.txt before reading the code
and I didn't go back to modify any comments in this section. Kind of the
don't pollute my point of reference with how I read the code...

>>
>> Because of the text for "Normal job condition is used by all other
>> jobs", I wonder if the agent job description should go above Normal job;
>> otherwise, perhaps it'd be best to distinguish "qemu monitor" jobs in
>> the normal description.
>>
>>>      Immediately after acquiring the virDomainObjPtr lock, any method
>>> -    which intends to update state must acquire either asynchronous or
>>> -    normal job condition.  The virDomainObjPtr lock is released while
>>> +    which intends to update state must acquire asynchronous, normal or
>>> +    agent job condition. The virDomainObjPtr lock is released while
>>
>> There is no agent job condition, yet. And how on earth do you do both
>> agent and normal?
> 
> There's a difference between "job condition" and pthread_cond_t. What
> this paragraph understands as "job condition" is qemuDomainJob really.
> And in the second hunk of my patch I'm enumerating all three ways how to
> acquire jobs.
> I'll drop "condition" word which is apparently confusing.
> 

mmm... true... use of "XXXX job condition" is confusing, but it's what
it's named and apparently didn't cause confusion previously.

>>
>>>      blocking on these condition variables.  Once the job condition is
>>>      acquired, a method can safely release the virDomainObjPtr lock
>>>      whenever it hits a piece of code which may sleep/wait, and
>>> @@ -132,6 +138,30 @@ To acquire the normal job condition
>>>  
>>>  
>>>  
>>> +To acquire the agent job condition
>>> +
>>> +  qemuDomainObjBeginAgentJob()
>>> +    - Waits until there is no other agent job set
>>
>> So no agent job before starting, but...
>>
>>> +    - Sets job.agentActive tp the job type
>>> +
>>> +  qemuDomainObjEndAgentJob()
>>> +    - Sets job.agentActive to 0
>>> +    - Signals on job.cond condition
>>
>> ... we signal on job.cond even though we didn't necessarily obtain?
> 
> Yes. I don't see the problem here.
> 

Of course not, you wrote the patches!

Again, my point of reference.  There's a job.asyncCond and a job.cond
with job.cond being used for both - it just read strangely.  Once you're
intimate with the code, different story.

[...]

>>> -    while (priv->job.active) {
>>> +    while (!qemuDomainObjCanSetJob(priv, job, agentJob)) {
>>>          VIR_DEBUG("Waiting for job (vm=%p name=%s)", obj, obj->def->name);
>>>          if (virCondWaitUntil(&priv->job.cond, &obj->parent.lock, then) < 0)
>>>              goto error;
>>
>> Part of me says this check is ripe for some future coding error, but
>> logically AFAICT it works.
> 
> Well, the whole reason why any virCondWait() MUST be guarded with
> while() loop is that pthread_cond_wait() can wake up anytime even
> without anybody signalling it. This is taken from pthread_cond_wait(3p)
> man page:
> 
>   When using condition variables there is always a Boolean predicate
>   involving shared variables associated with each condition wait that is
>   true if the thread should proceed. Spurious wakeups from the
>   pthread_cond_timedwait() or pthread_cond_wait()  functions may occur.
>   Since the return from pthread_cond_timedwait() or pthread_cond_wait()
>   does not imply anything about the value of this predicate, the
>   predicate should be re-evaluated upon such return.
> 
> Therefore, if there's a thread waiting for say agent job, and it gets
> signalized by another thread ending monitor job, it is woken up but
> realizes this is a "spurious" wake up and returns into virCondWait()
> immediately.
> 
> IOW, yes we will get more wake ups here but it also enables us to grab
> both jobs at the same time (without us having to worry about how to wait
> on two pthread_cond at the same time).
> 

OK - I came to realize that for the most part, without of course the
extra reference to the man page. It's nuances like this that I generally
have to reacquaint myself with when debugging problems in code (not that
I'd necessarily stop to read extra normally helpful comments ;-)).

> 
>>
>>> @@ -6427,32 +6447,48 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
>>>      if (!nested && !qemuDomainNestedJobAllowed(priv, job))
>>>          goto retry;
>>>  
>>> -    qemuDomainObjResetJob(priv);
>>> -
>>>      ignore_value(virTimeMillisNow(&now));
>>>  
>>> -    if (job != QEMU_JOB_ASYNC) {
>>> -        VIR_DEBUG("Started job: %s (async=%s vm=%p name=%s)",
>>> -                   qemuDomainJobTypeToString(job),
>>> -                  qemuDomainAsyncJobTypeToString(priv->job.asyncJob),
>>> -                  obj, obj->def->name);
>>> -        priv->job.active = job;
>>> -        priv->job.owner = virThreadSelfID();
>>> -        priv->job.ownerAPI = virThreadJobGet();
>>> +    if (job) {
>>> +        qemuDomainObjResetJob(priv);
>>> +
>>> +        if (job != QEMU_JOB_ASYNC) {
>>> +            VIR_DEBUG("Started job: %s (async=%s vm=%p name=%s)",
>>> +                      qemuDomainJobTypeToString(job),
>>> +                      qemuDomainAsyncJobTypeToString(priv->job.asyncJob),
>>> +                      obj, obj->def->name);
>>> +            priv->job.active = job;
>>> +            priv->job.owner = virThreadSelfID();
>>> +            priv->job.ownerAPI = virThreadJobGet();
>>> +            priv->job.started = now;
>>> +        } else {
>>> +            VIR_DEBUG("Started async job: %s (vm=%p name=%s)",
>>> +                      qemuDomainAsyncJobTypeToString(asyncJob),
>>> +                      obj, obj->def->name);
>>> +            qemuDomainObjResetAsyncJob(priv);
>>> +            if (VIR_ALLOC(priv->job.current) < 0)
>>> +                goto cleanup;
>>> +            priv->job.current->status = QEMU_DOMAIN_JOB_STATUS_ACTIVE;
>>> +            priv->job.asyncJob = asyncJob;
>>> +            priv->job.asyncOwner = virThreadSelfID();
>>> +            priv->job.asyncOwnerAPI = virThreadJobGet();
>>> +            priv->job.asyncStarted = now;
>>> +            priv->job.current->started = now;
>>> +        }
>>> +    }
>>> +
>>> +    if (agentJob) {
>>> +        qemuDomainObjResetAgentJob(priv);
>>> +
>>> +        VIR_DEBUG("Started agent job: %s (vm=%p name=%s job=%s async=%s)",
>>> +                  qemuDomainAgentJobTypeToString(agentJob),
>>> +                  obj, obj->def->name,
>>> +                  qemuDomainJobTypeToString(priv->job.active),
>>> +                  qemuDomainAsyncJobTypeToString(priv->job.asyncJob));
>>> +        priv->job.agentActive = agentJob;
>>> +        priv->job.agentOwner = virThreadSelfID();
>>> +        priv->job.agentOwnerAPI = virThreadJobGet();
>>>          priv->job.started = now;
>>> -    } else {
>>> -        VIR_DEBUG("Started async job: %s (vm=%p name=%s)",
>>> -                  qemuDomainAsyncJobTypeToString(asyncJob),
>>> -                  obj, obj->def->name);
>>> -        qemuDomainObjResetAsyncJob(priv);
>>> -        if (VIR_ALLOC(priv->job.current) < 0)
>>> -            goto cleanup;
>>> -        priv->job.current->status = QEMU_DOMAIN_JOB_STATUS_ACTIVE;
>>> -        priv->job.asyncJob = asyncJob;
>>> -        priv->job.asyncOwner = virThreadSelfID();
>>> -        priv->job.asyncOwnerAPI = virThreadJobGet();
>>> -        priv->job.asyncStarted = now;
>>> -        priv->job.current->started = now;
>>
>> So after reading the code and thinking about the documentation, IMO,
>> having agentJob use job.cond is confusing.  Having JobWithAgent is even
>> more confusing. But it seems the code would work
> 
> Again, naming is hard. So any suggestions for better names of functions
> are welcome. Basically what we need are names to cover the following
> cases:
> 
>                    | needs monitor | doesn't need monitor |
>                    +---------------+----------------------+
> needs agent        |      X        |          X           |
>                    +---------------+----------------------+
> doesn't need agent |      X        |          0           |
>                    +---------------+----------------------+
> 

The "confusion" is squarely with the separate cond for async, but no
separate cond for normal and agent. Whether generating historical code
comments will help or not - who's to say. Noting that cond is used for
both normal and agent in some comment could help, but it requires
someone to read the comment too!

As for naming, yep too many opinions - and you know what opinions are
like, right? ;-)  Still you have "BeginAgentJob" and "BeginJobWithAgent"
where the latter is in some way a combined job. Whether the agent is
used is only determined after the Job is obtained (for shutdown and
reboot) or when the job requires both agent and monitor calls (for set
time).  That's why I went with the BLOCK name because I couldn't think
of anything better either!

In all 3 cases though we don't want other agent or monitor jobs to run
concurrently, so I understand the need to get both. It's this
intersection that did cause the most concern while reviewing.

>>
>> I think what concerns me is there are 3 paths (I read patch 4) that
>> would seemingly indicate that by starting an agentJob we block a normal
>> domain job. In order to start one, IIUC both agentJob and normal job
>> must be 0. 
> 
> No. I don't see that in code. What I see is the following predicate:
> 
>   return (job == QEMU_JOB_NONE || !priv->job.active) &&
>          (agentJob == QEMU_AGENT_JOB_NONE || !priv->job.agentActive);
> 
> So if a thread wants just agentJob (in which case job == QEMU_JOB_NONE)
> this predicate is true (= thread can set agent job) as long as
> agentActive is unset. Similarly, if a thread wants just monitor job
> (agentJob == QEMU_AGENT_JOB_NONE) this predicate allows that as long as
> priv->job.active is 0. And finally, if thread wants both jobs, both
> priv->job.active AND priv->job.agentActive have to be equal to zero.
> 
> And here you can see how having just one condition pays off: ALL threads
> waiting are woken up, but only that one which can set the job will
> successfully set it. The rest goes back to sleep.
> 

Maybe I wasn't clear enough... There are 3 paths (shutdown, reboot, and
set time) which require both job.active and job.agentActive to be 0
before proceeding. In the totality of things, yes, things seem to work.
I still had lingering thoughts and wanted in a way to be sure the way I
was understanding things matched up. As much as there is separation
there is an amount of subtle co-dependency which wasn't perhaps obvious
the first time or two that I read the code. After studying it for a
while, writing thoughts, etc. I've come to understand it better.
Sometimes it helps to write down things and that jiggles a different
thought for the author as well - for others it upsets them. Sorry for
the churn.

>> That seems reasonable until one starts thinking about races
>> when there's a thread waiting for a JobWithAgent and a thread waiting
>> for one or the other. Combine that with the check "if (!nested &&
>> !qemuDomainNestedJobAllowed(priv, job))" which may now pass NONE for
>> @job if one of those threads was an agentJob.
> 
> I'm failing to see any race here, sorry. Can you give me an example
> please?
> 

A race doesn't exist until someone finds one. I didn't find one or I
would have noted it - I wouldn't leave you hanging like that trying to
guess! It's the subsequent NestedJobAllowed call check that caused some
level of doubt/concern, but I couldn't put my finger on anything specific.

John

[...]