[vfio-users] VM doesn't boot, hangs with R9 Fury passthrough

Sat Sep 5 19:20:31 UTC 2015

On 2015-09-02 20:57, Alex Williamson wrote:
> On Wed, Sep 2, 2015 at 11:22 AM, Matti Niemenmaa <
> matti.niemenmaa+vfio at iki.fi> wrote:
>> DMAR: ERROR: DMA PTE for vPFN 0xfea00 already set (to c7d9c3003 not
>> 383feea00083)
>>
>
> This means that we're trying to map an IOVA, expecting the existing page
> table entry to be zero, but it's not.  There is tracing you can enable in
> QEMU, see docs/tracing.txt.  I generally use the stderr backend and for
> this, tracing "trace_vfio_listener*" ought to show the mappings.  There's
> also tracing on the kernel side that could make sure QEMU and kernel have
> the same mappings.

It's actually specified without the "trace_": "vfio_listener*" works.

I had missed the kernel tracing (the DMA dump following such DMAR errors 
— I assume this is what you meant?) earlier, because coupled with the 
hundreds of warning messages the log buffer ended up maxing out. I 
bumped CONFIG_LOG_BUF_SHIFT way up to 24 now to make sure I don't miss 
anything. I also enabled various other debugging options in the kernel 
in the hopes of catching something, but to little avail.

I believe that CONFIG_DMA_API_DEBUG caught the following:

DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000000479000

The call trace points to libata as the culprit. Even though I don't 
think it's related to the passthrough issues, I bumped 
RADIX_TREE_MAX_TAGS from 3 to 11 (8 wasn't enough, i.e. even 255 
overlapping mappings were exceeded) in the kernel source to get rid of 
the issue. I hope this doesn't mean I'm papering over something more 
fundamental, though.

As for the tracing, it mostly added to my confusion. For example, here's 
another DMAR error:

DMAR: ERROR: DMA PTE for vPFN 0xfea00 already set (to fe43e9003 not 
bea00083)

And here's an arbitrary line (the last one) from the kernel DMA mapping 
dump that immediately follows the above:

radeon 0000:01:00.0: page idx 1023 P=968f0000 N=968f0 D=ff7fe000 L=1000 
DMA_BIDIRECTIONAL dma map error checked

And here's an arbitrary line from the QEMU trace (minus timestamp):

vfio_listener_region_add_ram region_add [ram] 100000000 - 43fffffff 
[0x7f5560000000]

E.g. looking at address 0xfea00, I can find it in several of those 
region_add ranges, and it's in a live range when the DMAR error is 
triggered, which seems expected. But I can't find it in any of the 
ranges from the kernel's dump. Am I looking at it wrong? AFAICT the L 
value is the length in decimal and the others are different kinds of 
starting addresses. 0xfea00 does not fall in any range of length L 
starting at any of the P, N, or D values. So where exactly is it 
"already set"? 0xfe43e9003 doesn't fall in any of those ranges either 
(and nor does 0xbea00083 but that seems expected), so I'm having trouble 
understanding what I should look for and where.

Miscellaneous notes and findings, in no particular order:

* Whenever a DMAR error occurs, the values always seem to end in 3. To 
me, odd numbers like that seem strange for page table entries.

* Unlike I previously thought, my messing around with the Windows boot 
recovery-related settings doesn't seem to affect whether the VM gets as 
far as the Windows 10 logo (and associated spinner). What matters is 
that I boot once with "-vga std" all the way to the desktop and then do 
a proper shutdown — if I don't do that first, the VM boots before the 
Windows logo shows up. Perhaps this means that the boot recovery 
settings screen is somehow problematic with passthrough.

* I disabled "above 4G decoding" in the host motherboard's UEFI settings 
to see if it changes anything. The only difference I've noticed is that 
the DMAR errors now always have a 32-bit value after the "not".

* There are lots of "SKIPPING" messages in the QEMU trace. It doesn't 
seem like they're intrinsically problematic, though.