[libvirt] [PATCH 05/14] qemu: Implement NVDIMM

Daniel P. Berrange berrange at redhat.com
Thu Feb 23 13:30:41 UTC 2017


On Thu, Feb 23, 2017 at 02:25:13PM +0100, Michal Privoznik wrote:
> On 02/23/2017 02:16 PM, Daniel P. Berrange wrote:
> > On Thu, Feb 23, 2017 at 02:07:27PM +0100, Michal Privoznik wrote:
> >> On 02/23/2017 11:48 AM, Daniel P. Berrange wrote:
> >>> On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote:
> >>>> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote:
> >>>>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote:
> >>>>>> So, majority of the code is just ready as-is. Well, with one
> >>>>>> slight change: differentiate between dimm and nvdimm in places
> >>>>>> like device alias generation, generating the command line and so
> >>>>>> on.
> >>>>>>
> >>>>>> Speaking of the command line, we also need to append 'nvdimm=on'
> >>>>>> to the '-machine' argument so that the nvdimm feature is
> >>>>>> advertised in the ACPI tables properly.
> >>>>>>
> >>>>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> >>>>>
> >>>>>
> >>>>>> +        if (prealloc &&
> >>>>>> +            virJSONValueObjectAdd(props,
> >>>>>> +                                  "b:prealloc", true,
> >>>>>> +                                  NULL) < 0)
> >>>>>> +            goto cleanup;
> >>>>>
> >>>>> As discussed on IRC, using prealloc with QEMU causes it to memset()
> >>>>> the first byte of every page of memory to 0. With NVDIMM this is
> >>>>> obviously corrupting application data stored in the NVDIMM.
> >>>>>
> >>>>> Obviously this needs fixing in QEMU, but I would think that libvirt
> >>>>> needs to block use of prealloc==true when running against existing
> >>>>> broken QEMU versions, otherwise users are going to be rather upset
> >>>>> to see their data corrupted on every boot
> >>>>
> >>>> They are, but should we work around broken QEMUs? And if so - how do we
> >>>> detect whether the qemu we are dealing with is broken or already fixed
> >>>> in that respect?
> >>>
> >>> Most likely we'll just have todo a version number check I think, since
> >>> fixing this isn't going to involve any externally visible change to
> >>> QEMU.
> >>
> >> What about backports then? If some distro decides to backport your fix
> >> posted on the qemu list, this check here would prevent them from running
> >> fixed qemu. Also, I don't think we want to work around qemu bugs.
> > 
> > If the distro backports the QEMU fix, then they would have to adjust
> > their libvirt to relax the version check too. Obviously we try to
> > avoid this kind of scenario, but sometimes there's no other option,
> > and this looks like one of those times.
> 
> Okay, fair enough.
> 
> > 
> >>>> On the other hand, we can just not set prealloc true for nvdimm. That
> >>>> will have one downside though - after qemu mmaps() the file, kernel is
> >>>> not forced to create a private copy of the pages.
> >>>
> >>> If the guest has requested prealloc, then I don't think we can ignore
> >>> it for NVDIMM, because its a clear violation of the memory guarantee
> >>> we're claiming to provide apps. Thus I think we've no option but to
> >>> report an error if we see prealloc + broken QEMU
> >>
> >> Do you mean user instead of guest? Because it's user who requests
> >> prealloc (well, in theory it is user; libvirt does not allow users to
> >> request prealloc but rather has some rules worked in that request it
> >> instead).
> > 
> > Sorry yes, i meant user. eg if they set
> > 
> >   <memoryBacking>
> >     <allocation mode="immediate"/>
> >   </memoryBacking>
> > 
> > I guess this is fairly new syntax so probably few people/apps are using
> > it. IIRC, if this is not present in XML, then we just automagically
> > assume  allocation=immediate if using huge pages.
> > 
> > So I guess we could say, if allocation=immediate is *not* in the guest
> > XML, then NVDIMMs are assumed to have allocation=ondemand by default.
> > 
> > It'd be inconsistent behaviour, but not wrong.
> > 
> > That way we'd only need to raise an error if the user had explicitly
> > set allocation=immediate
> 
> Frankly, I'd rather be consistent AND bug-free at the same time. Can't
> we make NVDIMM work with just fixed QEMUs? At the level code, the nvdimm
> capability would be set if nvdimm is found in a list of devices provided
> by qemu AND if the qemu version is at least 2.8 (assuming your fix makes
> it into the release).

Yep that's a valid approach to take assuming QEMU get a fix done in a
timely manner, so that people aren't pressuring us to support the buggy
QEMU. Lets aim todo that until someone complains :-)

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|




More information about the libvir-list mailing list