Add option to skip cpu feature and model verification in libvirt and rely on qemu 'enforce' option

Jiri Denemark jdenemar at redhat.com
Wed Nov 23 13:18:20 UTC 2022


On Wed, Nov 23, 2022 at 12:14:45 +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 23, 2022 at 12:51:39PM +0100, Jiri Denemark wrote:
> > On Wed, Nov 23, 2022 at 11:41:03 +0000, Daniel P. Berrangé wrote:
> > > On Wed, Nov 23, 2022 at 12:29:24PM +0100, Jiri Denemark wrote:
> > > > On Wed, Nov 23, 2022 at 11:18:01 +0000, Daniel P. Berrangé wrote:
> > > > > On Tue, Nov 22, 2022 at 01:48:39PM +0100, Jiri Denemark wrote:
> > > > > > On Mon, Nov 21, 2022 at 12:47:58 +0530, manish.mishra wrote:
> > > > > > > 
> > > > > > > On 18/11/22 5:00 pm, Jiri Denemark wrote:
> > > > > > > > On Fri, Nov 18, 2022 at 10:18:59 +0000, John Levon wrote:
> > > > > > > >> On Fri, Nov 18, 2022 at 10:52:32AM +0100, Jiri Denemark wrote:
> > > > > > > >>
> > > > > > > >>>>    * Qemu already provide an option 'enforce' to validate if features
> > > > > > > >>>>    with which vm is started is exactly same as one provided and nothing
> > > > > > > >>>>    is silently dropped.
> > > > > > > >>> Right, but it's not enough. In addition to removed features libvirt also
> > > > > > > >>> checks for unexpectedly added features. And you really need to do both.
> > > > > > > >>> Because if you ask for -cpu Model,feat1=on,feat2=on,enforce and QEMU
> > > > > > > >>> says everything is fine, the guest might see more than what you asked.
> > > > > > > >>> For example, if a feature is enabled only if a host supports it you may
> > > > > > > >>> or may not get it without any complains from QEMU. But if you get it you
> > > > > > > >>> really need to explicitly ask for it during migration, otherwise the
> > > > > > > >>> feature can just silently disappear. Of course, this would be a really
> > > > > > > >>> bad behavior from QEMU, but that does not mean it can't happen (I think
> > > > > > > >>> SVM is a bit problematic in this way) and the whole point of libvirt's
> > > > > > > >>> checks is to prevent this kind of issues.
> > > > > > > >> Hi Jiri, I'm not following this very well. I think you're saying that qemu has
> > > > > > > >> had bugs previously where features get silently enabled,
> > > > > > > > Personally, I haven't actually witnessed any bug in this area (as far as
> > > > > > > > I can remember, which is not that far :-)), but got some reports of at
> > > > > > > > least one, even though without any proof.
> > > > > > > >
> > > > > > > > Specifically, SVM is quite strange as it is included in all AMD CPU
> > > > > > > > models in QEMU and yet if you try to start it on a host without nesting
> > > > > > > > enabled "enforce" does not complain. I saw the feature is enabled with
> > > > > > > > older machine types, but I was told the magic behind this feature looks
> > > > > > > > like not only machine type but even host configuration itself is
> > > > > > > > involved. Anyway, although I saw reports of that the feature could be
> > > > > > > > enabled without an explicit request even with new machine types I still
> > > > > > > > haven't seen any proof of this happening. So I hope it just does not
> > > > > > > > happen and users are only afraid of this possibility.
> > > > > > > >
> > > > > > > >> and it's libvirt's job/role to paper over those issues? Do you have
> > > > > > > >> some specific cases of this?
> > > > > > > > This is not about papering. It's actually the opposite, that is about
> > > > > > > > detecting if something like this happens and reporting it as a failure
> > > > > > > > rather than papering it and hoping everything goes well. So I think
> > > > > > > > doing this is a good idea even though I don't think we actually saw any
> > > > > > > > issue of this kind.
> > > > > > > 
> > > > > > > 
> > > > > > > Hi Jiri, I see now with libvirt master, with check=='full' we verify
> > > > > > > both silently dropped as well as added features. But as you already
> > > > > > > stated Qemu silently adding feature is a Qemu bug and libvirt just
> > > > > > > reports that bug, so it should be very unlikely, i agree that is not a
> > > > > > > good reasoning :). Our requirement is that we want to use CPU Models
> > > > > > > and features which are defined in Qemu but not in libvirt for e.g if
> > > > > > > we want to use Icelake-Server-V4 directly or newly added vmx feature,
> > > > > > > libvirt does not allow. Currently we take help of qemu to do
> > > > > > > validations but for cpu feature verfication and model definations we
> > > > > > > still use libvirt defined definations which prevent us to use anything
> > > > > > > which is not defined in libvirt. I see there are already efforts going
> > > > > > > on to get all model and feature defination from qemu itself, but not
> > > > > > > sure how much time it will take. Till that happens we thought safest
> > > > > > > option is to have an option to remove all validations from libvirt and
> > > > > > > rely on qemu 'enforce' for migration safetly. I understand
> > > > > > > qemu-enforce does not check for silently added features, but that case
> > > > > > > we can assume is very unlikely and Qemu should fix otherwise VMs will
> > > > > > > not poweron anyway with check=='full'. Basically we want it as an
> > > > > > > modification of check='none' but also skipping things like
> > > > > > > virCPUValidateFeatures and passing option 'enforce' to Qemu. Or if
> > > > > > > silently adding features is that big concern we can have a check in
> > > > > > > Qemu itself? I understand current qemu-enforce is not as migration
> > > > > > > safe as check=='full' but probably suitable for our use case for time
> > > > > > > being?
> > > > > > 
> > > > > > I understand why you want this, but on the other hand adding just a
> > > > > > plain pass-through for CPU model and features is quite dangerous as it
> > > > > > can be used to set any -cpu option even if it's not a feature. And
> > > > > > especially the parts that are configured in other areas of domain XML
> > > > > > (such as hypervisor features). So I think instead of adding a new mode
> > > > > > for <cpu> we should go another way. For things that are not yet
> > > > > > supported by libvirt we support various elements in qemu namespace,
> > > > > > e.g., setting QEMU command line options, environment variables, or even
> > > > > > overriding options libvirt would normally set on the command line. So I
> > > > > > guess we could have a special <qemu:cpu> element which would be used
> > > > > > when a user wants full control over the -cpu option without any libvirt
> > > > > > intervention.
> > > > > 
> > > > > I really don't think we should allow this, especially not for something
> > > > > which is clearly intended to be used for production deployment. Our
> > > > > hypervisor CLI passthrough is there to allow for short term workarounds
> > > > > for bugs, or to experiment with a feature before it is mapped into
> > > > > libvirt in a supported manner.
> > > > > 
> > > > > If there are aspects related to QEMU configuration thiat are not in
> > > > > libvirt yet, we need to address those gaps, not provide yet another
> > > > > way to bypass libvirt mapping.
> > > > 
> > > > Indeed, this was definitely meant as a short term workaround for stuff
> > > > we don't have support for yet, for testing or experimental purposes. The
> > > > supported approach is for implement the missing parts in libvirt (and
> > > > QEMU) as soon as possible. I don't see why would a properly documented
> > > > support for experiments with -cpu would be an issue.
> > > 
> > > Why can't it just use the exsting QEMU passthrough syntax we have.
> > > I don't think we should be adding specifial support just for CPUs
> > 
> > That would be nice, but the QEMU passthrough syntax cannot be used for
> > changing options that libvirt already passes to QEMU. So using it would
> > likely result in two separate -cpu options on QEMU command line. And it
> > would not rule out the CPU verification code in libvirt. Remember, we
> > add a default -cpu model in case there's none configured in the XML.
> 
> Two -cpu options isn't a problem, the latter will override the former,
> which is fine from the level of support intended for QEMU passthrough
> usage.

Oh, OK. In that case this would be usable. But see below.

> For the libvirt checking, isn't the 'check=none' attr sufficient
> to skip checks libvirt does.

It does, although once a domain is started, we update the CPU definition
in the XML to use check='full'. So check='none' is only usable for
starting a domain. Migration will always use check='full'. We would need
to come up with a new check value to completely turn off this
functionality.

Jirka


More information about the libvir-list mailing list