[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Two questions about NVDIMM devices



On Thu, 2020-09-10 at 20:53 +0200, Michal Privoznik wrote:
> On 9/10/20 4:56 PM, Daniel P. Berrangé wrote:
> > On Thu, Sep 10, 2020 at 04:54:08PM +0200, Milan Zamazal wrote:
> > > Daniel P. Berrangé <berrange redhat com> writes:
> > > > > > If we're enforcing this 1 MB rounding though, we really should be
> > > > > > documenting it clearly, so that apps can pick the right backing file
> > > > > > size. I think we dropped the ball on docs.
> > > > > 
> > > > > I still can't see it in the documentation, would it be possible to be
> > > > > clear about it in the docs, please?  For first, it's not very intuitive
> > > > > to figure out that (if I've figured out it correctly) on POWER one
> > > > > *must* specify the NVDIMM size S as
> > > > > 
> > > > >    S == aligned_size + label_size
> > > > > 
> > > > > and that size is used for the QEMU device; while on x86_64 one can
> > > > > specify any size S and
> > > > > 
> > > > >    align_up(S)
> > > > > 
> > > > > will be used for the QEMU device (and label size doesn't influence the
> > > > > value).  And additional alignment may be required for having any memory
> > > > > hot plug working.

The ppc64-specific requirements were documented with

  commit 8f474ceea05aec349be19726e394a62e300efe77
  Author: Daniel Henrique Barboza <danielhb413 gmail com>
  Date:   Mon Jul 20 13:51:46 2020 -0300

    formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic
    
    The reason why we align down the guest area (total-size - label-size) is
    explained in the body of qemuDomainNVDimmAlignSizePseries(). This
    behavior must also be documented in the user docs.
    
    Signed-off-by: Daniel Henrique Barboza <danielhb413 gmail com>
    Reviewed-by: Andrea Bolognani <abologna redhat com>

but later reverted. See below.

> > > > > For second, and more importantly, I'm afraid that without documenting
> > > > > it, future changes may break the current behavior without warning.  For
> > > > > example, the recent changes regarding POWER alignment in 6.7.0 are for
> > > > > good IMHO and one can use the same size with both 6.7 and 6.6 versions,
> > > > > but they could still cause pre-6.7 sizes stop working.
> > > > 
> > > > I don't know what changes you are referring to here, but if they were
> > > > in libvirt I'd consider that a bug - we shouldn't break a previously
> > > > working configuration by increasing required alignment.
> > > 
> > > I mean disabling the auto alignment in
> > > https://gitlab.com/libvirt/libvirt/-/commit/07de813924caf37e535855541c0c1183d9d382e2
> > > and replacing it with validation in
> > > https://gitlab.com/libvirt/libvirt/-/commit/0ccceaa57c50e5ee528f7073fa8723afd62b88b7
> > > 
> > > That change can cause a VM fail to start but after (manually) adjusting
> > > the device size, all should work all right.  Changes that would actually
> > > change sizes would be more dangerous.
> > 
> > Sigh, that second commit even calls out the fact that it breaks
> > existing guests. This needs to be reverted, as that is not acceptable.
> 
> Thing is, on PPC it was never working IIRC. I remember discussing this 
> with Andrea. So from my POV, there wasn't really anything to break.

This is my fault for not keeping a close enough eye on the patch
series when it was being posted and reviewed upstream. Sorry :(

While it is true that QEMU will reject sizes that are not compliant
with the ppc64-specific alignment requirements, libvirt was already
aware of this and behaved accordingly: so a configuration like

  <memory model='nvdimm'>
    <source>
      <path>/var/lib/libvirt/images/nvdimm.raw</path>
    </source>
    <target>
      <size unit='KiB'>524288</size>
      <label>
        <size unit='KiB'>128</size>
      </label>
    </target>
  </memory>

can't work because the usable area (512 MiB - 128 KiB) is not aligned
to 256 MiB, however libvirt will align it down and generate a QEMU
command line that looks like

  -object memory-backend-file,[...],size=268566528
  -device nvdimm,[...],label-size=131072

where 131072 is 128 KiB and 268566528 is 265 MiB + 128 KiB. In this
scenario, assuming the user has allocated a backing file that's 512
MiB in size, ~256 MiB will never be used and will go to waste, but
other than that the sizing requirements will be satisfied from QEMU's
point of view and the guest will happily start.

So ultimately this change caused existing guests to stop working and
I agree with Dan that it should be reverted.

-- 
Andrea Bolognani / Red Hat / Virtualization


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]