[Virtio-fs] [PATCH] vhost-user-fs: add capability to allow migration

Mon Jan 23 19:53:35 UTC 2023

On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst at redhat.com) wrote:
> > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > 
> > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > state will be present at the destination to restore full context and
> > > > > > > continue running.
> > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > the special "migrateable" case.
> > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > tracks whether device is unplugged for us.
> > > > > > 
> > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > where not to migrate.
> > > > > > 
> > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > make an intelligent decision whether to allow migration,
> > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > already says orchestrator decided to migrate.
> > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > updating qemu executable via local migration. So this flag is
> > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > 
> > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > to request migration only when the migration can be performed in a
> > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > "always blocks migration" to "migrates partial state whenever
> > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > misinterpreted as full support for migration in all cases.
> > > > It's not really different from block is it? orchestrator has to arrange
> > > > for backend migration. I think we just assumed there's no use-case where
> > > > this is practical for vhost-user-fs so we blocked it.
> > > > But in any case it's orchestrator's responsibility.
> > > 
> > > Yes, you are right. So do you think we should just drop the blocker
> > > without adding a new flag?
> > 
> > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> 
> Yes I think that's probably OK, as long as we use the flag for knowing
> how to handle the discard bitmap as a proxy for the daemon knowing how
> to handle *some* migrations; knowing which migrations is then the job
> for the orchestrator to be careful of.

I think the feature bit is not a good way to detect live migration
support. vhost-user backends typically use libvhost-user, rust-vmm's
vhost-user-backend crate, etc where this feature can be implemented for
free. If the feature bit is advertized we don't know if the device
implementation (net, blk, fs, etc) is aware of migration at all.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20230123/8c9c7f21/attachment.sig>