[Virtio-fs] [PATCH] vhost-user-fs: add capability to allow migration

Thu Jan 19 14:30:53 UTC 2023

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin at yandex-team.ru> wrote:
>
> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> > On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin at yandex-team.ru> wrote:
> >> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> >> qemu update without stopping the VM. In most cases that makes sense
> >> because qemu has no way to transfer FUSE session state.
> >>
> >> But we can give an option to orchestrator to override this if it can
> >> guarantee that state will be preserved (e.g. it uses migration to
> >> update qemu and dst will run on the same host as src and use the same
> >> socket endpoints).
> >>
> >> This patch keeps default behavior that prevents migration with such devices
> >> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> >>
> >> Signed-off-by: Anton Kuchin <antonkuchin at yandex-team.ru>
> >> ---
> >>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> >>   qapi/migration.json       |  7 ++++++-
> >>   2 files changed, 30 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> >> index f5049735ac..13d920423e 100644
> >> --- a/hw/virtio/vhost-user-fs.c
> >> +++ b/hw/virtio/vhost-user-fs.c
> >> @@ -24,6 +24,7 @@
> >>   #include "hw/virtio/vhost-user-fs.h"
> >>   #include "monitor/monitor.h"
> >>   #include "sysemu/sysemu.h"
> >> +#include "migration/migration.h"
> >>
> >>   static const int user_feature_bits[] = {
> >>       VIRTIO_F_VERSION_1,
> >> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> >>       return &fs->vhost_dev;
> >>   }
> >>
> >> +static int vhost_user_fs_pre_save(void *opaque)
> >> +{
> >> +    MigrationState *s = migrate_get_current();
> >> +
> >> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> >> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> >> +                     "state of backend to be preserved. If orchestrator can "
> >> +                     "guarantee this (e.g. dst connects to the same backend "
> >> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> >> +                     "migration capability to true to enable migration.");
> >> +        return -1;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >> +
> >>   static const VMStateDescription vuf_vmstate = {
> >>       .name = "vhost-user-fs",
> >> -    .unmigratable = 1,
> >> +    .minimum_version_id = 0,
> >> +    .version_id = 0,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_VIRTIO_DEVICE,
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +   .pre_save = vhost_user_fs_pre_save,
> >>   };
> > Will it be possible to extend this vmstate when virtiofsd adds support
> > for stateful migration without breaking migration compatibility?
> >
> > If not, then I think a marker field should be added to the vmstate:
> > 0 - stateless/reconnect migration (the approach you're adding in this patch)
> > 1 - stateful migration (future virtiofsd feature)
> >
> > When the field is 0 there are no further vmstate fields and we trust
> > that the destination vhost-user-fs server already has the necessary
> > state.
> >
> > When the field is 1 there are additional vmstate fields that contain
> > the virtiofsd state.
> >
> > The goal is for QEMU to support 3 migration modes, depending on the
> > vhost-user-fs server:
> > 1. No migration support.
> > 2. Stateless migration.
> > 3. Stateful migration.
>
> Sure. These vmstate fields are very generic and mandatory for any
> virtio device. If in future more state can be transfer in migration
> stream the vmstate can be extended with additional fields. This can
> be done with new subsections and/or bumping version_id.

My concern is that the vmstate introduced in this patch may be
unusable when stateful migration is added. So additional compatibility
code will need to be introduced to make your stateless migration
continue working with extended vmstate.

By adding a marker field in this patch it should be possible to
continue using the same vmstate for stateless migration without adding
extra compatibility code in the future.

> The main purpose of this patch is to allow update VM to newer version
> of qemu via local migration without disruption to guest. And future
> versions hopefully could pack more state from external environment
> to migration stream.
>
>
> >
> >>   static Property vuf_properties[] = {
> >> diff --git a/qapi/migration.json b/qapi/migration.json
> >> index 88ecf86ac8..9a229ea884 100644
> >> --- a/qapi/migration.json
> >> +++ b/qapi/migration.json
> >> @@ -477,6 +477,11 @@
> >>   #                    will be handled faster.  This is a performance feature and
> >>   #                    should not affect the correctness of postcopy migration.
> >>   #                    (since 7.1)
> >> +# @vhost-user-fs: If enabled, the migration process will allow migration of
> >> +#                 vhost-user-fs devices, this should be enabled only when
> >> +#                 backend can preserve local FUSE state e.g. for qemu update
> >> +#                 when dst reconects to the same endpoints after migration.
> >> +#                 (since 8.0)
> > This is global but a guest can have multiple vhost-user-fs devices
> > connected to different servers.
> AFAIK vhost-user requires unix socket and memory shared from guest so
> devices can't be connected to different servers, just to different
> endpoints on current host.

Different vhost-user-fs server software. vhost-user-fs is a protocol,
there can be multiple server implementations. These implementations
may have different capabilities (some may not support stateless
migration). So I think stateless migration should be a per-instance
setting, not global.

>
> >
> > I would add a qdev property to the device instead of introducing a
> > migration capability. The property would enable "stateless migration".
> > When the property is not set, migration would be prohibited.
> I did thought about that, but this is really not a property of device,
> this is the capability of management software and applies to exactly one
> particular migration process that it initiates. It should not persist
> across migration or be otherwise stored in device.

I disagree. The vhost-user-fs server software must implement stateless
migration in order for this to work. For example,
https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless
migration as far as I know.

>
> The idea here is that orchestrator can ensure destination qemu will
> run on the same host, will reconnect to the same unix sockets and only
> then sets the flag (because inside qemu we can't know anything about
> the destination).
> This is somewhat similar to ignore-shared migration capability when
> qemu avoids saving and loading guest memory that is stores in shmem
> because it will be picked up by destination process right where source
> left it.
>
> >>   #
> >>   # Features:
> >>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> >> @@ -492,7 +497,7 @@
> >>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> >>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> >>              'validate-uuid', 'background-snapshot',
> >> -           'zero-copy-send', 'postcopy-preempt'] }
> >> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> >>
> >>   ##
> >>   # @MigrationCapabilityStatus:
> >> --
> >> 2.34.1
> >>
> >>