[Virtio-fs] [PATCH 0/4] vhost-user-fs: Internal migration

Mon May 8 19:31:16 UTC 2023

On Mon, May 8, 2023 at 7:51 PM Eugenio Perez Martin <eperezma at redhat.com> wrote:
>
> On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek <hreitz at redhat.com> wrote:
> >
> > On 05.05.23 16:37, Hanna Czenczek wrote:
> > > On 05.05.23 16:26, Eugenio Perez Martin wrote:
> > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek <hreitz at redhat.com>
> > >> wrote:
> > >>> (By the way, thanks for the explanations :))
> > >>>
> > >>> On 05.05.23 11:03, Hanna Czenczek wrote:
> > >>>> On 04.05.23 23:14, Stefan Hajnoczi wrote:
> > >>> [...]
> > >>>
> > >>>>> I think it's better to change QEMU's vhost code
> > >>>>> to leave stateful devices suspended (but not reset) across
> > >>>>> vhost_dev_stop() -> vhost_dev_start(), maybe by introducing
> > >>>>> vhost_dev_suspend() and vhost_dev_resume(). Have you thought about
> > >>>>> this aspect?
> > >>>> Yes and no; I mean, I haven’t in detail, but I thought this is what’s
> > >>>> meant by suspending instead of resetting when the VM is stopped.
> > >>> So, now looking at vhost_dev_stop(), one problem I can see is that
> > >>> depending on the back-end, different operations it does will do
> > >>> different things.
> > >>>
> > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(),
> > >>> which for vDPA will suspend the device, but for vhost-user will
> > >>> reset it
> > >>> (if F_STATUS is there).
> > >>>
> > >>> It disables all vrings, which doesn’t mean stopping, but may be
> > >>> necessary, too.  (I haven’t yet really understood the use of disabled
> > >>> vrings, I heard that virtio-net would have a need for it.)
> > >>>
> > >>> It then also stops all vrings, though, so that’s OK.  And because this
> > >>> will always do GET_VRING_BASE, this is actually always the same
> > >>> regardless of transport.
> > >>>
> > >>> Finally (for this purpose), it resets the device status via
> > >>> vhost_ops->vhost_reset_status().  This is only implemented on vDPA, and
> > >>> this is what resets the device there.
> > >>>
> > >>>
> > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does
> > >>> so in .vhost_reset_status.  It would seem better to me if vhost-user
> > >>> would also reset the device only in .vhost_reset_status, not in
> > >>> .vhost_dev_start.  .vhost_dev_start seems precisely like the place to
> > >>> run SUSPEND/RESUME.
> > >>>
> > >> I think the same. I just saw It's been proposed at [1].
> > >>
> > >>> Another question I have (but this is basically what I wrote in my last
> > >>> email) is why we even call .vhost_reset_status here.  If the device
> > >>> and/or all of the vrings are already stopped, why do we need to reset
> > >>> it?  Naïvely, I had assumed we only really need to reset the device if
> > >>> the guest changes, so that a new guest driver sees a freshly
> > >>> initialized
> > >>> device.
> > >>>
> > >> I don't know why we didn't need to call it :). I'm assuming the
> > >> previous vhost-user net did fine resetting vq indexes, using
> > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex
> > >> devices.
> > >>
> > >> The guest can reset the device, or write 0 to the PCI config status,
> > >> at any time. How does virtiofs handle it, being stateful?
> > >
> > > Honestly a good question because virtiofsd implements neither
> > > SET_STATUS nor RESET_DEVICE.  I’ll have to investigate that.
> > >
> > > I think when the guest resets the device, SET_VRING_BASE always comes
> > > along some way or another, so that’s how the vrings are reset.  Maybe
> > > the internal state is reset only following more high-level FUSE
> > > commands like INIT.
> >
> > So a meeting and one session of looking-into-the-code later:
> >
> > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens
> > to serve the purpose.  (German is currently on that.)
> >
> > In our meeting, German said the reset would occur when the memory
> > regions are changed, but I can’t see that in the code.
>
> That would imply that the status is reset when the guest's memory is
> added or removed?
>
> > I think it only
> > happens implicitly through the SET_VRING_BASE call, which resets the
> > internal avail/used pointers.
> >
> > [This doesn’t seem different from libvhost-user, though, which
> > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to
> > reset the device on RESET_OWNER, but really doesn’t (its
> > vu_reset_device_exec() function just disables all vrings, doesn’t reset
> > or even stop them).]
> >
> > Consequently, the internal state is never reset.  It would be cleared on
> > a FUSE Destroy message, but if you just force-reset the system, the
> > state remains into the next reboot.  Not even FUSE Init clears it, which
> > seems weird.  It happens to work because it’s still the same filesystem,
> > so the existing state fits, but it kind of seems dangerous to keep e.g.
> > files open.  I don’t think it’s really exploitable because everything
> > still goes through the guest kernel, but, well.  We should clear the
> > state on Init, and probably also implement SET_STATUS and clear the
> > state there.
> >
>
> I see. That's in the line of assuming GET_VRING_BASE is the last
> message received from qemu.
>

Actually, does it prevent device recovery after a failure in
migration? Is the same state set for the device?

Thanks!