[libvirt PATCH] qemu: adjust memlock for multiple vfio/vdpa devices
Laine Stump
laine at redhat.com
Mon Aug 22 16:07:55 UTC 2022
On 8/22/22 10:25 AM, Jonathon Jongsma wrote:
> When multiple VFIO or VDPA devices are assigned to a guest, the guest
> can fail to start because the guest fails to map enough memory. For
> example, the case mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1994893 results in this
> failure:
>
> 2021-08-05T09:51:47.692578Z qemu-kvm: failed to write, fd=24, errno=14 (Bad address)
> 2021-08-05T09:51:47.692590Z qemu-kvm: vhost vdpa map fail!
> 2021-08-05T09:51:47.692594Z qemu-kvm: vhost-vdpa: DMA mapping failed, unable to continue
>
> The current memlock limit calculation does not work for scenarios where
> there are multiple such devices assigned to a guest. The root causes are
> a little bit different between VFIO and VDPA devices.
>
> For VFIO devices, the issue only occurs when a vIOMMU is present. In
> this scenario, each vfio device is assigned a separate AddressSpace
> fully mapping guest RAM. When there is no vIOMMU, the devices are all
> within the same AddressSpace so no additional memory limit is needed.
>
> For VDPA devices, each device requires the full memory to be mapped
> regardless of whether there is a vIOMMU or not.
>
> In order to enable these scenarios, we need to multiply memlock limit
> by the number of VDPA devices plus the number of VFIO devices for guests
> with a vIOMMU. This has the potential for pushing the memlock limit
> above the host physical memory and negating any protection that these
> locked memory limits are providing, but there is no other short-term
> solution.
>
> In the future, there should be have a revised userspace iommu interface
> (iommufd) that the VFIO and VDPA backends can make use of. This will be
> able to share locked memory limits between both vfio and vdpa use cases
> and address spaces and then we can disable these short term hacks. But
> this is still in development upstream.
>
> Signed-off-by: Jonathon Jongsma <jjongsma at redhat.com>
Please include this tag:
Resolves: https://bugzilla.redhat.com/2111317
(or the longer version of the link if you prefer)
> ---
> src/qemu/qemu_domain.c | 56 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 45f00e162d..a1e91ef48f 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -9233,6 +9233,40 @@ getPPC64MemLockLimitBytes(virDomainDef *def,
> }
>
>
> +static int
> +qemuDomainGetNumVFIODevices(const virDomainDef *def)
> +{
> + int i;
The syntax check doesn't like this - says you should use "size_t i;"
instead.
> + int n = 0;
> +
> + for (i = 0; i < def->nhostdevs; i++) {
> + if (virHostdevIsVFIODevice(def->hostdevs[i]) ||
> + virHostdevIsMdevDevice(def->hostdevs[i]))
> + n++;
> + }
> + for (i = 0; i < def->ndisks; i++) {
> + if (virStorageSourceChainHasNVMe(def->disks[i]->src))
> + n++;
> + }
> + return n;
> +}
> +
> +
> +static int
> +qemuDomainGetNumVDPANetDevices(const virDomainDef *def)
> +{
> + int i;
Same here.
Once those are fixed,
Reviewed-by: Laine Stump <laine at redhat.com>
Oh, and this definitely warrants an entry in the release notes.
> + int n = 0;
> +
> + for (i = 0; i < def->nnets; i++) {
> + if (virDomainNetGetActualType(def->nets[i]) == VIR_DOMAIN_NET_TYPE_VDPA)
> + n++;
> + }
> +
> + return n;
> +}
> +
> +
> /**
> * qemuDomainGetMemLockLimitBytes:
> * @def: domain definition
> @@ -9252,6 +9286,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
> bool forceVFIO)
> {
> unsigned long long memKB = 0;
> + int nvfio;
> + int nvdpa;
>
> /* prefer the hard limit */
> if (virMemoryLimitIsSet(def->mem.hard_limit)) {
> @@ -9270,6 +9306,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
> if (ARCH_IS_PPC64(def->os.arch) && def->virtType == VIR_DOMAIN_VIRT_KVM)
> return getPPC64MemLockLimitBytes(def, forceVFIO);
>
> + nvfio = qemuDomainGetNumVFIODevices(def);
> + nvdpa = qemuDomainGetNumVDPANetDevices(def);
> /* For device passthrough using VFIO the guest memory and MMIO memory
> * regions need to be locked persistent in order to allow DMA.
> *
> @@ -9288,8 +9326,22 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
> *
> * Note that this may not be valid for all platforms.
> */
> - if (forceVFIO || qemuDomainNeedsVFIO(def) || virDomainDefHasVDPANet(def))
> - memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024;
> + if (forceVFIO || nvfio || nvdpa) {
> + /* At present, the full memory needs to be locked for each VFIO / VDPA
> + * device. For VFIO devices, this only applies when there is a vIOMMU
> + * present. Yes, this may result in a memory limit that is greater than
> + * the host physical memory, which is not ideal. The long-term solution
> + * is a new userspace iommu interface (iommufd) which should eliminate
> + * this duplicate memory accounting. But for now this is the only way
> + * to enable configurations with e.g. multiple vdpa devices.
> + */
> + int factor = nvdpa;
> +
> + if (def->iommu)
> + factor += nvfio;
> +
> + memKB = MAX(factor, 1) * virDomainDefGetMemoryTotal(def) + 1024 * 1024;
> + }
>
> return memKB << 10;
> }
More information about the libvir-list
mailing list