[libvirt] [PATCH v5] qemu: Introduce state_lock_timeout toqemu.conf

wang.yi59 at zte.com.cn wang.yi59 at zte.com.cn
Thu Sep 13 02:03:29 UTC 2018


Thank you both for your patience and discussion, John and Michal.
I will send a new version to fix the issues referred in the discussion.

> On 09/10/2018 10:22 PM, John Ferlan wrote:
> >
> >
> > On 09/05/2018 11:09 PM, Yi Wang wrote:
> >> When doing some job holding state lock for a long time,
> >> we may come across error:
> >
> > blank line
> >
> >> "Timed out during operation: cannot acquire state change lock"
> >
> > blank line
> >
> >> Well, sometimes it's not a problem and users want to continue
> >> to wait, and this patch allow users decide how long time they
> >> can wait the state lock.
> >
> > As I noted in a previous review... With this patch it's not the
> > individual user or command that can decide how long to wait, rather it's
> > set by qemu.conf policy. The duration of the wait just becomes
> > customize-able. Not that it's any different the current implementation,
> > since the value is set in qemuDomainObjBeginJobInternal that means
> > there's no distinguishing between job, asyncjob, and agentjob. There's
> > also no attempt to allow changing the value via virt-admin.
>
> I'm not sure how we would implement per API timeout. We could perhaps do
> per connection but that's also not desired.
>
> And regarding "how does one know how long to wait" - well, they don't.
> That's the thing with timeouts. It's not an argument against them
> though. Even primitive functions (e.g. those from pthread) have
> XXX_timed() variants.
>
> >
> > A "strong recommendation" of longer than 30 seconds doesn't provide much
> > guidance. The setting of value generally depends on the configuration
> > and what causes an API to hold the lock for longer than 30 seconds.
> > That's probably limited to primarily virDomainListGetStats and/or
> > virConnectGetAllDomainStats. There's other factors in play including CPU
> > speed, CPU quota allowed, network throughput/latency, disk speed, etc.
> > Tough to create a value the works for every command/purpose.
>
> Agreed, "strong recommendation" is use case dependant (although it's
> perhaps what initiated this patch). If anything, we should discourage
> people from touching this knob.
>
> >
> > Ironically (perhaps) allowing a value of 0 would essentially make it so
> > no one waited - such as how qemuDomainObjBeginJobNowait operates
> > essentially. OTOH, in order to have everything wait "as long as
> > possible" passing -1 would be a lot easier than providing 2147483647 (or
> > if properly typed to uint, then 4294967295U).
>
> Passing 0 is explicitly forbidden (although in a bit clumsy way - we
> need to check the value right after it's parsed because multiplying it
> by 1000 can lead to overflow and a negative value becomes positive or
> vice versa).

[......]

> > If this were an unsigned int, then the < 0 check wouldn't necessary.
> >
> > So then the question becomes, should 0 (zero) be special cased as wait
> > forever.
>
> This check needs to be done before multiplication otherwise an overflow
> might occur. Also, the multiplication can be done in BeginJobInternal so
> that cfg->stateLockTimeout holds value entered by user.
>
> >
> >> +        virReportError(VIR_ERR_CONF_SYNTAX, "%s",
> >> +                       _("state_lock_timeout should larger than zero"));
> >
> > If 0 were left as invalid, then s/should/must be/
> >
> >> +        return -1;
> >> +    }
> >> +
> >>      return 0;
> >>  }
> >>
> Michal


---
Best wishes
Yi Wang


More information about the libvir-list mailing list