[Virtio-fs] [PATCH 0/4] vhost-user-fs: Internal migration

Mon May 8 17:51:45 UTC 2023

On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek <hreitz at redhat.com> wrote:
>
> On 05.05.23 16:37, Hanna Czenczek wrote:
> > On 05.05.23 16:26, Eugenio Perez Martin wrote:
> >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek <hreitz at redhat.com>
> >> wrote:
> >>> (By the way, thanks for the explanations :))
> >>>
> >>> On 05.05.23 11:03, Hanna Czenczek wrote:
> >>>> On 04.05.23 23:14, Stefan Hajnoczi wrote:
> >>> [...]
> >>>
> >>>>> I think it's better to change QEMU's vhost code
> >>>>> to leave stateful devices suspended (but not reset) across
> >>>>> vhost_dev_stop() -> vhost_dev_start(), maybe by introducing
> >>>>> vhost_dev_suspend() and vhost_dev_resume(). Have you thought about
> >>>>> this aspect?
> >>>> Yes and no; I mean, I haven’t in detail, but I thought this is what’s
> >>>> meant by suspending instead of resetting when the VM is stopped.
> >>> So, now looking at vhost_dev_stop(), one problem I can see is that
> >>> depending on the back-end, different operations it does will do
> >>> different things.
> >>>
> >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(),
> >>> which for vDPA will suspend the device, but for vhost-user will
> >>> reset it
> >>> (if F_STATUS is there).
> >>>
> >>> It disables all vrings, which doesn’t mean stopping, but may be
> >>> necessary, too.  (I haven’t yet really understood the use of disabled
> >>> vrings, I heard that virtio-net would have a need for it.)
> >>>
> >>> It then also stops all vrings, though, so that’s OK.  And because this
> >>> will always do GET_VRING_BASE, this is actually always the same
> >>> regardless of transport.
> >>>
> >>> Finally (for this purpose), it resets the device status via
> >>> vhost_ops->vhost_reset_status().  This is only implemented on vDPA, and
> >>> this is what resets the device there.
> >>>
> >>>
> >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does
> >>> so in .vhost_reset_status.  It would seem better to me if vhost-user
> >>> would also reset the device only in .vhost_reset_status, not in
> >>> .vhost_dev_start.  .vhost_dev_start seems precisely like the place to
> >>> run SUSPEND/RESUME.
> >>>
> >> I think the same. I just saw It's been proposed at [1].
> >>
> >>> Another question I have (but this is basically what I wrote in my last
> >>> email) is why we even call .vhost_reset_status here.  If the device
> >>> and/or all of the vrings are already stopped, why do we need to reset
> >>> it?  Naïvely, I had assumed we only really need to reset the device if
> >>> the guest changes, so that a new guest driver sees a freshly
> >>> initialized
> >>> device.
> >>>
> >> I don't know why we didn't need to call it :). I'm assuming the
> >> previous vhost-user net did fine resetting vq indexes, using
> >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex
> >> devices.
> >>
> >> The guest can reset the device, or write 0 to the PCI config status,
> >> at any time. How does virtiofs handle it, being stateful?
> >
> > Honestly a good question because virtiofsd implements neither
> > SET_STATUS nor RESET_DEVICE.  I’ll have to investigate that.
> >
> > I think when the guest resets the device, SET_VRING_BASE always comes
> > along some way or another, so that’s how the vrings are reset.  Maybe
> > the internal state is reset only following more high-level FUSE
> > commands like INIT.
>
> So a meeting and one session of looking-into-the-code later:
>
> We reset every virt queue on GET_VRING_BASE, which is wrong, but happens
> to serve the purpose.  (German is currently on that.)
>
> In our meeting, German said the reset would occur when the memory
> regions are changed, but I can’t see that in the code.

That would imply that the status is reset when the guest's memory is
added or removed?

> I think it only
> happens implicitly through the SET_VRING_BASE call, which resets the
> internal avail/used pointers.
>
> [This doesn’t seem different from libvhost-user, though, which
> implements neither SET_STATUS nor RESET_DEVICE, and which pretends to
> reset the device on RESET_OWNER, but really doesn’t (its
> vu_reset_device_exec() function just disables all vrings, doesn’t reset
> or even stop them).]
>
> Consequently, the internal state is never reset.  It would be cleared on
> a FUSE Destroy message, but if you just force-reset the system, the
> state remains into the next reboot.  Not even FUSE Init clears it, which
> seems weird.  It happens to work because it’s still the same filesystem,
> so the existing state fits, but it kind of seems dangerous to keep e.g.
> files open.  I don’t think it’s really exploitable because everything
> still goes through the guest kernel, but, well.  We should clear the
> state on Init, and probably also implement SET_STATUS and clear the
> state there.
>

I see. That's in the line of assuming GET_VRING_BASE is the last
message received from qemu.

Thanks!