[vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

Sat Apr 21 09:02:14 UTC 2018

> > > > Hi,
> > > >
> > > > The qemu process will  stuck when hot-add large size  memory to
> > > > the virtual machine with a device passtrhough.
> > > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > > Is there any method to improve this process?
> > >
> > > At what size do you start to see problems?  The time to map a
> > > section of memory should be directly proportional to the size.  As
> > > the size is increased, it will take longer, but I don't know why
> > > you'd reach a point of not making forward progress.  Is it actually
> > > stuck or is it just taking longer than you want?  Using hugepages
> > > can certainly help, we still need to pin each PAGE_SIZE page within
> > > the hugepage, but we'll have larger contiguous regions and therefore
> > > call iommu_map() less frequently.  Please share more data.  Thanks,
> > >
> > > Alex
> > It just take longer time, instead of actually stuck.
> > We found that the problem exist when we hot-added 16G memory. And it
> > will consume tens of minutes when we hot-added 1T memory.
> 
> Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
> have some inflection in the size vs time curve?  There is a cost to
> pinning an mapping through the IOMMU, perhaps we can improve that, but I
> don't see how we can eliminate it or how it wouldn't be at least linear
> compared to the size of memory added without moving to a page request
> model, which hardly any hardware currently supports.  A workaround might
> be to incrementally add memory in smaller chunks which generate a less
> noticeable stall.  Thanks,
> 
> Alex
I collected a part of report as below recorded by perf when I hot-added 24GB memory:
+   63.41%     0.00%  qemu-kvm         qemu-kvm-2.8.1-25.127       [.] 0xffffffffffc7534a
+   63.41%     0.00%  qemu-kvm         [kernel.vmlinux]            [k] do_vfs_ioctl
+   63.41%     0.00%  qemu-kvm         [kernel.vmlinux]            [k] sys_ioctl
+   63.41%     0.00%  qemu-kvm         libc-2.17.so                [.] __GI___ioctl
+   63.41%     0.00%  qemu-kvm         qemu-kvm-2.8.1-25.127       [.] 0xffffffffffc71c59
+   63.10%     0.00%  qemu-kvm         [vfio]                      [k] vfio_fops_unl_ioctl
+   63.10%     0.00%  qemu-kvm         qemu-kvm-2.8.1-25.127       [.] 0xffffffffffcbbb6a
+   63.10%     0.02%  qemu-kvm         [vfio_iommu_type1]          [k] vfio_iommu_type1_ioctl
+   60.67%     0.31%  qemu-kvm         [vfio_iommu_type1]          [k] vfio_pin_pages_remote
+   60.06%     0.46%  qemu-kvm         [vfio_iommu_type1]          [k] vaddr_get_pfn
+   59.61%     0.95%  qemu-kvm         [kernel.vmlinux]            [k] get_user_pages_fast
+   54.28%     0.02%  qemu-kvm         [kernel.vmlinux]            [k] get_user_pages_unlocked
+   54.24%     0.04%  qemu-kvm         [kernel.vmlinux]            [k] __get_user_pages
+   54.13%     0.01%  qemu-kvm         [kernel.vmlinux]            [k] handle_mm_fault
+   54.08%     0.03%  qemu-kvm         [kernel.vmlinux]            [k] do_huge_pmd_anonymous_page
+   52.09%    52.09%  qemu-kvm         [kernel.vmlinux]            [k] clear_page
+    9.42%     0.12%  swapper          [kernel.vmlinux]            [k] cpu_startup_entry
+    9.20%     0.00%  swapper          [kernel.vmlinux]            [k] start_secondary
+    8.85%     0.02%  swapper          [kernel.vmlinux]            [k] arch_cpu_idle
+    8.79%     0.07%  swapper          [kernel.vmlinux]            [k] cpuidle_idle_call
+    6.16%     0.29%  swapper          [kernel.vmlinux]            [k] apic_timer_interrupt
+    5.73%     0.07%  swapper          [kernel.vmlinux]            [k] smp_apic_timer_interrupt
+    4.34%     0.99%  qemu-kvm         [kernel.vmlinux]            [k] gup_pud_range
+    3.56%     0.16%  swapper          [kernel.vmlinux]            [k] local_apic_timer_interrupt
+    3.32%     0.41%  swapper          [kernel.vmlinux]            [k] hrtimer_interrupt
+    3.25%     3.21%  qemu-kvm         [kernel.vmlinux]            [k] gup_huge_pmd
+    2.31%     0.01%  qemu-kvm         [kernel.vmlinux]            [k] iommu_map
+    2.30%     0.00%  qemu-kvm         [kernel.vmlinux]            [k] intel_iommu_map

It seems that the bottleneck is trying to pin pages through get_user_pages instead of do iommu mapping.

Thanks,
Wu Zongyong