[vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

Fri Apr 20 03:11:12 UTC 2018

> > > > Hi,
> > > >
> > > > The qemu process will  stuck when hot-add large size  memory to
> > > > the virtual machine with a device passtrhough.
> > > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > > Is there any method to improve this process?
> > >
> > > At what size do you start to see problems?  The time to map a
> > > section of memory should be directly proportional to the size.  As
> > > the size is increased, it will take longer, but I don't know why
> > > you'd reach a point of not making forward progress.  Is it actually
> > > stuck or is it just taking longer than you want?  Using hugepages
> > > can certainly help, we still need to pin each PAGE_SIZE page within
> > > the hugepage, but we'll have larger contiguous regions and therefore
> > > call iommu_map() less frequently.  Please share more data.  Thanks,
> > >
> > > Alex
> > It just take longer time, instead of actually stuck.
> > We found that the problem exist when we hot-added 16G memory. And it
> > will consume tens of minutes when we hot-added 1T memory.
> 
> Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
> have some inflection in the size vs time curve?  There is a cost to
> pinning an mapping through the IOMMU, perhaps we can improve that, but I
> don't see how we can eliminate it or how it wouldn't be at least linear
> compared to the size of memory added without moving to a page request
> model, which hardly any hardware currently supports.  A workaround might
> be to incrementally add memory in smaller chunks which generate a less
> noticeable stall.  Thanks,
> 
> Alex
It took about 1 minute to add 16GB and about 40 minutes to add 1TB.