[vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

Thu Apr 19 13:00:38 UTC 2018

On Thu, 19 Apr 2018 01:37:41 +0000
"Wuzongyong (Euler Dept)" <cordius.wu at huawei.com> wrote:

> > > Hi,
> > >
> > > The qemu process will  stuck when hot-add large size  memory to the
> > > virtual machine with a device passtrhough.
> > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > Is there any method to improve this process?  
> > 
> > At what size do you start to see problems?  The time to map a section of
> > memory should be directly proportional to the size.  As the size is
> > increased, it will take longer, but I don't know why you'd reach a point
> > of not making forward progress.  Is it actually stuck or is it just taking
> > longer than you want?  Using hugepages can certainly help, we still need
> > to pin each PAGE_SIZE page within the hugepage, but we'll have larger
> > contiguous regions and therefore call iommu_map() less frequently.  Please
> > share more data.  Thanks,
> > 
> > Alex  
> It just take longer time, instead of actually stuck. 
> We found that the problem exist when we hot-added 16G memory. And it will consume
> tens of minutes when we hot-added 1T memory.

Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
have some inflection in the size vs time curve?  There is a cost to
pinning an mapping through the IOMMU, perhaps we can improve that, but
I don't see how we can eliminate it or how it wouldn't be at least
linear compared to the size of memory added without moving to a page
request model, which hardly any hardware currently supports.  A
workaround might be to incrementally add memory in smaller chunks which
generate a less noticeable stall.  Thanks,

Alex