[libvirt] [PATCH v4 3/3] qemu: Automatically create NVRAM store

Fri Aug 22 09:38:06 UTC 2014

On 08/22/14 10:54, Daniel P. Berrange wrote:
> On Fri, Aug 22, 2014 at 10:27:29AM +0200, Laszlo Ersek wrote:
>> On 08/21/14 11:05, Daniel P. Berrange wrote:
>>> On Thu, Aug 21, 2014 at 10:50:24AM +0200, Michal Privoznik wrote:
>>>> diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf
>>>> index 7bbbe09..79bba36 100644
>>>> --- a/src/qemu/qemu.conf
>>>> +++ b/src/qemu/qemu.conf
>>>> @@ -487,3 +487,17 @@
>>>>  # Defaults to 1.
>>>>  #
>>>>  #log_timestamp = 0
>>>> +
>>>> +
>>>> +# Location of master nvram file
>>>> +#
>>>> +# When a domain is configured to use UEFI instead of standard
>>>> +# BIOS it may use a separate storage for UEFI variables. If
>>>> +# that's the case libvirt creates the variable store per domain
>>>> +# using this master file as image. Each UEFI firmware can,
>>>> +# however, have different variables store. Therefore the nvram is
>>>> +# a list of strings when a single item is in form of:
>>>> +#   ${PATH_TO_UEFI_FW}:${PATH_TO_UEFI_VARS}.
>>>> +# Later, when libvirt creates per domain variable store, this
>>>> +# list is searched for the master image.
>>>> +#nvram = [ "/usr/share/OVMF/OVMF_CODE.fd:/usr/share/OVMF/OVMF_VARS.fd" ]
>>>
>>> So the user has the ability to specify a arbitrary BIOS in the XML,
>>> but unless it matches one of the ones listed in the libvirt config
>>> they aren't going to be able to start the guest. What can we do
>>> about this, as it doesn't really seem like a great position to be
>>> in.
>>
>> I disagree. Users who use virt-manager (for which patches still have to
>> be written, to expose this feature) won't put arbitrary strings in the
>> <loader> element; virt-manager should offer a minimal choice between
>> "BIOS" vs. "UEFI".
>>
>> Users who are hard-core enough to hack the domain XML by hand are
>> expected to provide good values.
> 
> The problem I'm raising is that it is *not* sufficient to merely
> provide good values in the XML here. You can't simply deploy a
> custom OVMF file and update your XML, because this code is relying
> on values in the libvirtd.conf configuration file.

If the domain XML spells out both <loader> and <nvram>, then both should
be updated manually by the user (if the VM's old nvram is not compatible
with the new loader). This would include the user either instantiating
the new varstore for the VM, or removing the <nvram> element (so that
the new default template can take effect).

If the domain XML doesn't spell out <nvram> (either genuinely, or
because the user removed that element, see above), then yes, you need to
edit /etc/libvirt/qemu.conf.

I don't see a problem with that. You won't keep installing OVMF_CODE.fd
files in random locations in the host filesystem. You might be
developing OVMF and install various ad-hoc builds, but those would go to
the same location (same pathname), hence it would have to be added only
once to the qemu.conf file.

>>> I wonder if we should have explicitly record the "template" in the
>>> XML too. eg
>>>
>>>   <loader>...</loader>
>>>   <nvram template="...">...</nvram>
>>>
>>> If no template attribute is given, then don't try to automatically
>>> create the nvram file at all.
>>
>> I don't see how this would help with anything. If the template is not
>> provided, then you immediately branch to the failure case that you don't
>> seem to like above (ie. no varstore file exists for the guest).
> 
> It helps because it allows you to setup custom OVMF builds without
> having to edit libvirtd.conf - the information that would go into
> the libvirtd.conf file is instead provided directly in the XML.

That information would only be useful if the <nvram> element had the
@template argument, but no text contents. Meaning for the VM, "create my
personal varstore instance from this template if I lose my personal
varstore instance".

>> And, if the user *wants* to provde a varstore template, then the source
>> of *that* bit of information is precisely the same as the one of the
>> <loader> and <nvram> elements: the user himself. If the user can be
>> trusted (expected) to provide a valid varstore template in
>> nvram/@template, then he can be trusted (expected) just the same to pick
>> a firmware binary that is listed in the config file.
> 
> Nope, that's giving two different experiances. If you use the default
> OVMF file, then you don't need to care about manually setting up the
> NVRAM store for each guest becaue libvirt can copy the template. If
> you use a custom OVMF file apps have to manually setup NVRAM store
> per guest.

Not if you edit the nvram=[] map in /etc/libvirt/qemu.conf.

> This sucks. The solution of including the nvram template
> in the XML allows us to have a consistent experiance where libvirt
> is always able to automatically create the per-guest NVRAM file.

If you include the varstore template *only* in the VM-specific domain
XML file, then one very important use case will be broken: creating
brand new VMs with UEFI firmware. Since the domain XML doesn't exist yet
for the virtual machine, there's simply no place to look up the location
of the varstore template in, even if you are trying to use the
system-wide OVMF installation. Users will have to provide the pathname
of the default varstore.

>> The varstore template is not a VM-level property. The varstore template
>> is a property of the firmware binary.
> 
> Yes, and we're providing the firmware in the XML so it is reasonable
> to provide the varstore template location in the XML too, instead of
> having to re-configure libvirtd.conf and restart libvirtd to use it.

It's qemu.conf, not libvirtd.conf, but you're right about the need to
restart libvirtd.

I don't see any problem with that, given that you will not install
custom OVMFs in random locations on the host; you'll do that only
occasionally.

If you *do* install custom OVMFs in random locations, then you can
instantiate the varstores as well.

> 
>>
>>> Just require the user to pre-create
>>> it. That lets us avoid the global config parameters and any reliance
>>> on OVMF file naming conventions shown below
>>>
>>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>>> index baa866a..6f79a17 100644
>>>> --- a/src/qemu/qemu_process.c
>>>> +++ b/src/qemu/qemu_process.c
>>>> @@ -67,6 +67,7 @@
>>>>  #include "virstring.h"
>>>>  #include "virhostdev.h"
>>>>  #include "storage/storage_driver.h"
>>>> +#include "configmake.h"
>>>>  
>>>>  #define VIR_FROM_THIS VIR_FROM_QEMU
>>>>  
>>>> @@ -3734,6 +3735,130 @@ qemuProcessVerifyGuestCPU(virQEMUDriverPtr driver,
>>>>  }
>>>>  
>>>>  
>>>> +static int
>>>> +qemuPrepareNVRAM(virQEMUDriverConfigPtr cfg,
>>>> +                 virDomainDefPtr def,
>>>> +                 bool migrated)
>>>> +{
>>>> +    int ret = -1;
>>>> +    int srcFD = -1;
>>>> +    int dstFD = -1;
>>>> +    virDomainLoaderDefPtr loader = def->os.loader;
>>>> +    bool generated = false;
>>>> +    bool created = false;
>>>> +
>>>> +    /* Unless domain has RO loader of pflash type, we have
>>>> +     * nothing to do here.  If the loader is RW then it's not
>>>> +     * using split code and vars feature, so no nvram file needs
>>>> +     * to be created. */
>>>> +    if (!loader || loader->type != VIR_DOMAIN_LOADER_TYPE_PFLASH ||
>>>> +        loader->readonly != VIR_TRISTATE_SWITCH_ON)
>>>> +        return 0;
>>>> +
>>>> +    /* If the nvram path is configured already, there's nothing
>>>> +     * we need to do. Unless we are starting the destination side
>>>> +     * of migration in which case nvram is configured in the
>>>> +     * domain XML but the file doesn't exist yet. Moreover, after
>>>> +     * the migration is completed, qemu will invoke a
>>>> +     * synchronization write into the nvram file so we don't have
>>>> +     * to take care about transmitting the real data on the other
>>>> +     * side. */
>>>> +    if (loader->nvram && !migrated)
>>>> +        return 0;
>>>> +
>>>> +    /* Autogenerate nvram path if needed.*/
>>>> +    if (!loader->nvram) {
>>>> +        if (virAsprintf(&loader->nvram,
>>>> +                        "%s/lib/libvirt/qemu/nvram/%s_VARS.fd",
>>>> +                        LOCALSTATEDIR, def->name) < 0)
>>>> +            goto cleanup;
>>>
>>> This seems rather x86 OVMF specific in naming. It isn't going to work
>>> for other architectures in QEMU which would use a different NVRAM
>>> format or file naming convention.
>>
>> The code handles solid and split firmwares built from edk2. (Solid:
>> includes firmware code and variable store, hence RW; split: firmware
>> code and varstore are built as separate .fd files, hence firmware code
>> is RO.)
>>
>> The code in general is prepared to handle differences between
>> qemu-system-aarch64 and qemu-system-x86_64 in flash device mapping. I
>> don't think it's useful to try to future-proof this code for
>> architectures for which UEFI support has never even been considered in
>> qemu, let alone in edk2 or the UEFI spec.
>>
>> I think it suffices to adapt this code (if necessary at all) to aarch64
>> once qemu-system-aarch64 + KVM can boot UEFI at all.
> 
> I'm not talking about other architectures using UEFI. I'm thinking about
> the more general case of other architctures using other BIOS entirely
> which none the less have an NVRAM store. This is why we used a generic
> XML element of <nvram> instead of some UEFI specific naming. I don't
> remember what came of it now, but we have been asked for this for either
> PPC or s390 in the past (can't remember which).

I can't provide any input on such architectures or firmwares.

We can only extract general traits if we know all the specifics. If we
want to go general in advance, how do we know for example whether one
<nvram> element will suffice at all? Maybe some architecture (or
firmware) that has (or handles) non-volatile RAM expects several
writeable flash chips, mapped at various guest-phys addresses. The
current format doesn't accommodate that either (and it shouldn't, until
we actually see those architectures / firmwares).

Anyway I don't insist.

Thanks,
Laszlo