Add option to skip cpu feature and model verification in libvirt and rely on qemu 'enforce' option

Wed Nov 23 12:28:41 UTC 2022

On 23/11/22 5:44 pm, Daniel P. Berrangé wrote:
> On Wed, Nov 23, 2022 at 12:51:39PM +0100, Jiri Denemark wrote:
>> On Wed, Nov 23, 2022 at 11:41:03 +0000, Daniel P. Berrangé wrote:
>>> On Wed, Nov 23, 2022 at 12:29:24PM +0100, Jiri Denemark wrote:
>>>> On Wed, Nov 23, 2022 at 11:18:01 +0000, Daniel P. Berrangé wrote:
>>>>> On Tue, Nov 22, 2022 at 01:48:39PM +0100, Jiri Denemark wrote:
>>>>>> On Mon, Nov 21, 2022 at 12:47:58 +0530, manish.mishra wrote:
>>>>>>> On 18/11/22 5:00 pm, Jiri Denemark wrote:
>>>>>>>> On Fri, Nov 18, 2022 at 10:18:59 +0000, John Levon wrote:
>>>>>>>>> On Fri, Nov 18, 2022 at 10:52:32AM +0100, Jiri Denemark wrote:
>>>>>>>>>
>>>>>>>>>>>     * Qemu already provide an option 'enforce' to validate if features
>>>>>>>>>>>     with which vm is started is exactly same as one provided and nothing
>>>>>>>>>>>     is silently dropped.
>>>>>>>>>> Right, but it's not enough. In addition to removed features libvirt also
>>>>>>>>>> checks for unexpectedly added features. And you really need to do both.
>>>>>>>>>> Because if you ask for -cpu Model,feat1=on,feat2=on,enforce and QEMU
>>>>>>>>>> says everything is fine, the guest might see more than what you asked.
>>>>>>>>>> For example, if a feature is enabled only if a host supports it you may
>>>>>>>>>> or may not get it without any complains from QEMU. But if you get it you
>>>>>>>>>> really need to explicitly ask for it during migration, otherwise the
>>>>>>>>>> feature can just silently disappear. Of course, this would be a really
>>>>>>>>>> bad behavior from QEMU, but that does not mean it can't happen (I think
>>>>>>>>>> SVM is a bit problematic in this way) and the whole point of libvirt's
>>>>>>>>>> checks is to prevent this kind of issues.
>>>>>>>>> Hi Jiri, I'm not following this very well. I think you're saying that qemu has
>>>>>>>>> had bugs previously where features get silently enabled,
>>>>>>>> Personally, I haven't actually witnessed any bug in this area (as far as
>>>>>>>> I can remember, which is not that far :-)), but got some reports of at
>>>>>>>> least one, even though without any proof.
>>>>>>>>
>>>>>>>> Specifically, SVM is quite strange as it is included in all AMD CPU
>>>>>>>> models in QEMU and yet if you try to start it on a host without nesting
>>>>>>>> enabled "enforce" does not complain. I saw the feature is enabled with
>>>>>>>> older machine types, but I was told the magic behind this feature looks
>>>>>>>> like not only machine type but even host configuration itself is
>>>>>>>> involved. Anyway, although I saw reports of that the feature could be
>>>>>>>> enabled without an explicit request even with new machine types I still
>>>>>>>> haven't seen any proof of this happening. So I hope it just does not
>>>>>>>> happen and users are only afraid of this possibility.
>>>>>>>>
>>>>>>>>> and it's libvirt's job/role to paper over those issues? Do you have
>>>>>>>>> some specific cases of this?
>>>>>>>> This is not about papering. It's actually the opposite, that is about
>>>>>>>> detecting if something like this happens and reporting it as a failure
>>>>>>>> rather than papering it and hoping everything goes well. So I think
>>>>>>>> doing this is a good idea even though I don't think we actually saw any
>>>>>>>> issue of this kind.
>>>>>>>
>>>>>>> Hi Jiri, I see now with libvirt master, with check=='full' we verify
>>>>>>> both silently dropped as well as added features. But as you already
>>>>>>> stated Qemu silently adding feature is a Qemu bug and libvirt just
>>>>>>> reports that bug, so it should be very unlikely, i agree that is not a
>>>>>>> good reasoning :). Our requirement is that we want to use CPU Models
>>>>>>> and features which are defined in Qemu but not in libvirt for e.g if
>>>>>>> we want to use Icelake-Server-V4 directly or newly added vmx feature,
>>>>>>> libvirt does not allow. Currently we take help of qemu to do
>>>>>>> validations but for cpu feature verfication and model definations we
>>>>>>> still use libvirt defined definations which prevent us to use anything
>>>>>>> which is not defined in libvirt. I see there are already efforts going
>>>>>>> on to get all model and feature defination from qemu itself, but not
>>>>>>> sure how much time it will take. Till that happens we thought safest
>>>>>>> option is to have an option to remove all validations from libvirt and
>>>>>>> rely on qemu 'enforce' for migration safetly. I understand
>>>>>>> qemu-enforce does not check for silently added features, but that case
>>>>>>> we can assume is very unlikely and Qemu should fix otherwise VMs will
>>>>>>> not poweron anyway with check=='full'. Basically we want it as an
>>>>>>> modification of check='none' but also skipping things like
>>>>>>> virCPUValidateFeatures and passing option 'enforce' to Qemu. Or if
>>>>>>> silently adding features is that big concern we can have a check in
>>>>>>> Qemu itself? I understand current qemu-enforce is not as migration
>>>>>>> safe as check=='full' but probably suitable for our use case for time
>>>>>>> being?
>>>>>> I understand why you want this, but on the other hand adding just a
>>>>>> plain pass-through for CPU model and features is quite dangerous as it
>>>>>> can be used to set any -cpu option even if it's not a feature. And
>>>>>> especially the parts that are configured in other areas of domain XML
>>>>>> (such as hypervisor features). So I think instead of adding a new mode
>>>>>> for <cpu> we should go another way. For things that are not yet
>>>>>> supported by libvirt we support various elements in qemu namespace,
>>>>>> e.g., setting QEMU command line options, environment variables, or even
>>>>>> overriding options libvirt would normally set on the command line. So I
>>>>>> guess we could have a special <qemu:cpu> element which would be used
>>>>>> when a user wants full control over the -cpu option without any libvirt
>>>>>> intervention.
>>>>> I really don't think we should allow this, especially not for something
>>>>> which is clearly intended to be used for production deployment. Our
>>>>> hypervisor CLI passthrough is there to allow for short term workarounds
>>>>> for bugs, or to experiment with a feature before it is mapped into
>>>>> libvirt in a supported manner.
>>>>>
>>>>> If there are aspects related to QEMU configuration thiat are not in
>>>>> libvirt yet, we need to address those gaps, not provide yet another
>>>>> way to bypass libvirt mapping.
>>>> Indeed, this was definitely meant as a short term workaround for stuff
>>>> we don't have support for yet, for testing or experimental purposes. The
>>>> supported approach is for implement the missing parts in libvirt (and
>>>> QEMU) as soon as possible. I don't see why would a properly documented
>>>> support for experiments with -cpu would be an issue.
>>> Why can't it just use the exsting QEMU passthrough syntax we have.
>>> I don't think we should be adding specifial support just for CPUs
>> That would be nice, but the QEMU passthrough syntax cannot be used for
>> changing options that libvirt already passes to QEMU. So using it would
>> likely result in two separate -cpu options on QEMU command line. And it
>> would not rule out the CPU verification code in libvirt. Remember, we
>> add a default -cpu model in case there's none configured in the XML.
> Two -cpu options isn't a problem, the latter will override the former,
> which is fine from the level of support intended for QEMU passthrough
> usage.
>
> For the libvirt checking, isn't the 'check=none' attr sufficient
> to skip checks libvirt does.
>
>
> With regards,
> Daniel

Actually even within check='none' libvirt does few checks e.g. virCPUValidateFeatures if feature string passed is proper and defined in libvirt. We could extend check='none' to remove even these kind of things. So Jiri suggested that making such kind of changes in general path is not good idea so discussion went to using separate cpu class <qemu:cpu>.

Thanks

Manish Mishra

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20221123/6a7c6e04/attachment-0001.htm>