[dm-devel] [PATCH] iommu/amd: Fix NULL pointer deref on device detach READ FPDMA QUEUED errors since Linux 4.0

Andreas Hartmann andihartmann at freenet.de
Fri Oct 9 17:42:09 UTC 2015


Hello Jörg,

On 10/09/2015 at 04:45 PM, Joerg Roedel wrote:
> Hi Andreas,
> 
> On Thu, Oct 08, 2015 at 09:47:28PM +0200, Andreas Hartmann wrote:
>> This time, the oops was caused by the second PCI card I'm passing
>> through to another VM (the ath9k card worked fine this time - chance?).
>> I added the lspci output to the attached file, too.
> 
> I digged a little bit around here and found a 32bit PCI card and plugged
> it into the AMD IOMMU box. I could reproduce the problem and here is
> patch which fixes it for me. Can you test it too please?

Works fine here. Thanks.


But now, I can see the next big problem of v4.3-rc4: My VMs are
connected on the host via tun/tap devices. They themselves are connected
to a bridge device. If there is data sent between the VMs, the (system)
load is more as double (!) of the load seen with 4.1.x e.g. . Having an
allover throughput of about low 35 MBit/s seen by the host over all
tun/tap devices / bridge creates a load of 3 (!!) with 3 VMs being
involved. That's about half of the load produced by kernel compiling
(make -j8).



> I'd like to
> send a pull-req with this fix included to Linus for rc5.
> 
> Thanks,
> 
> 	Joerg
> 
> From d07307c04edffaaa045fb83713f8808e55ffa895 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <jroedel at suse.de>
> Date: Fri, 9 Oct 2015 16:23:33 +0200
> Subject: [PATCH] iommu/amd: Fix NULL pointer deref on device detach
> 
> When a device group is detached from its domain, the iommu
> core code calls into the iommu driver to detach each device
> individually.
> 
> Before this functionality went into the iommu core code, it
> was implemented in the drivers, also in the AMD IOMMU
> driver as the device alias handling code.
> 
> This code is still present, as there might be aliases that
> don't exist as real PCI devices (and are therefore invisible
> to the iommu core code).
> 
> Unfortunatly it might happen now, that a device is unbound
> multiple times from its domain, first by the alias handling
> code and then by the iommu core code (or vice verca).
> 
> This ends up in the do_detach function which dereferences
> the dev_data->domain pointer. When the device is already
> detached, this pointer is NULL and we get a kernel oops.
> 
> Removing the alias code completly is not an option, as that
> would also remove the code which handles invisible aliases.
> The code could be simplified, but this is too big of a
> change outside the merge window.
> 
> For now, just check the dev_data->domain pointer in
> do_detach and bail out if it is NULL.
> 
> Andreas Hartmann <andihartmann at freenet.de>
> Signed-off-by: Joerg Roedel <jroedel at suse.de>
> ---
>  drivers/iommu/amd_iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index f82060e7..08d2775 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2006,6 +2006,15 @@ static void do_detach(struct iommu_dev_data *dev_data)
>  {
>  	struct amd_iommu *iommu;
>  
> +	/*
> +	 * First check if the device is still attached. It might already
> +	 * be detached from its domain because the generic
> +	 * iommu_detach_group code detached it and we try again here in
> +	 * our alias handling.
> +	 */
> +	if (!dev_data->domain)
> +		return;
> +
>  	iommu = amd_iommu_rlookup_table[dev_data->devid];
>  
>  	/* decrease reference counters */
> 




More information about the dm-devel mailing list