[libvirt PATCH v3 5/5] qemu: enable asynchronous teardown on s390x hosts by default

Tue Jul 11 15:26:17 UTC 2023

On Tue, 11 Jul 2023 15:33:02 +0100
Daniel P. Berrangé <berrange at redhat.com> wrote:

> On Tue, Jul 11, 2023 at 04:22:12PM +0200, Claudio Imbrenda wrote:
> > On Tue, 11 Jul 2023 14:57:45 +0100
> > Daniel P. Berrangé <berrange at redhat.com> wrote:
> >   
> > > On Tue, Jul 11, 2023 at 03:48:25PM +0200, Claudio Imbrenda wrote:  
> > > > On Tue, 11 Jul 2023 09:17:00 +0100
> > > > Daniel P. Berrangé <berrange at redhat.com> wrote:
> > > > 
> > > > [...]
> > > >     
> > > > > > We could add additional time depending on the guest memory size BUT with
> > > > > > Secure Execution the timeout would need to be increased by factors (two
> > > > > > digits). Also for libvirt it is not possible to detect if the guest is in
> > > > > > Secure Execution mode.      
> > > > > 
> > > > > What component is causing this 2 orders of magnitude delay in shutting    
> > > > 
> > > > Secure Execution (protected VMs)    
> > > 
> > > So its the hardware that imposes the penalty, rather than something
> > > the kenrel is doing ?
> > > 
> > > Can anything else mitigate this ?  eg does using huge pages make it
> > > faster than normal pages ?  
> > 
> > unfortunately huge pages cannot be used for Secure Execution, it's a
> > hardware limitation.
> >   
> > >   
> > > > > down a guest ? If the host can't tell if Secure Execution mode is
> > > > > enabled or not, why would any code path be different & slower ?    
> > > > 
> > > > The host kernel (and QEMU) know if a specific VM is running in
> > > > secure mode, but there is no meaningful way for this information to be
> > > > communicated outwards (e.g. to libvirt)    
> > > 
> > > Can we expose this in one of the QMP commands, or a new one ? It feels
> > > like a mgmt app is going to want to know if a guest is running in secure
> > > mode or not, so it can know if this shutdown penalty is going to be
> > > present.  
> > 
> > I guess it would be possible (no idea how easy/clean it would be). the
> > issue would be that when the guest is running, it's too late to enable
> > asynchronous teardown.  
> 
> I think just need to document that async teardown is highly recommended
> regardless.  The ability to query  secure virt, is more about helping
> the application know whether async teardown will be fast or very very
> slow.
> 
> > also notice that the same guest can jump in and out of secure mode
> > without needing to shut down (a reboot is enough)  
> 
> Yep, though I imagine that's going to be fairly unlikely in practice.

true

> 
> > > > During teardown, the host kernel will need to do some time-consuming
> > > > extra cleanup for each page that belonged to a secure guest.
> > > >     
> > > > >     
> > > > > > I also assume that timeouts of +1h are not acceptable. Wouldn't a long
> > > > > > timeout cause other trouble like stalling "virsh list" run in parallel?      
> > > > > 
> > > > > Well a 1 hour timeout is pretty insane, even with the async teardown    
> > > > 
> > > > I think we all agree, and that's why asynchronous teardown was
> > > > implemented
> > > >     
> > > > > that's terrible as RAM is unable to be used for any new guest for
> > > > > an incredibly long time.    
> > > > 
> > > > I'm not sure what you mean here. RAM is not kept aside until the
> > > > teardown is complete; cleared pages are returned to the free pool
> > > > immediately as they are cleared. i.e. when the cleanup is halfway
> > > > through, half of the memory will have been freed.    
> > > 
> > > Yes, it is incrementally released, but in practice most hypervisors are
> > > memory constrained. So if you stop a 2 TB guest, and want to then boot it
> > > again, unless you have a couple of free TB of RAM hanging around, you're
> > > going to need to wait for most all of the orignial RAM to be reclaimed.  
> > 
> > if it's a secure guest, it will take time to actually use the memory
> > anyway. it's a similar issue to the teardown, but in reverse.  
> 
> Unless the guest is started with memory preallocation on the QEMU side
> which would make QEMU touch every page to fault it into RAM.

that would be unfortunate, indeed.

> 
> With regards,
> Daniel