[vfio-users] [BUG, HELP NEEDED] No signal (Invalid ROM) with some Asus Nvidia cards

Nicolas Roy-Renaud nicolas.roy-renaud.1 at ens.etsmtl.ca
Fri Feb 19 21:25:17 UTC 2016


On 2016-02-16 16:14, Ben J wrote:
>
> Are you running any graphics drivers on the host system at all? If so, 
> try blacklisting all of them and launching your VM from the system's 
> base graphics. I've had this error and it seems to be caused by the 
> driver for a different card somehow claiming it. Even if you have the 
> card claimed by vfio it's still possible for it to interfere.
>
> On Feb 16, 2016 2:37 PM, "Nicolas Roy-Renaud" 
> <nicolas.roy-renaud.1 at ens.etsmtl.ca 
> <mailto:nicolas.roy-renaud.1 at ens.etsmtl.ca>> wrote:
>
>     I've been wanting to report this one for a while, but I was hoping
>     I could gather some more information first to have a clear idea of
>     what's wrong. Hopefully some people on the mailing list will be
>     able to help us solve this.
>
>     The guest GPU I'm using on my current VFIO machine is an Asus-made
>     GTX 970 (GTX970-DC2OC-4GD5). Although the BIOS is supposed to
>     support UEFI (none of those on TechPowerUp do, but running GPU-Z
>     on a base-metal Windows install proves mine actually does),
>     whenever I try to perform a firmare dump from linux with the vfio
>     driver or boot a VM that should claim this card, I get an
>     "|Invalid ROM contents|" error. Last month, someone else came here
>     asking about a similar issue, and I realized after asking a few
>     questions that he was trying to pass an Asus-made GTX 750
>     (GTX750TI-PH-2GD5). As it turned out, he was experiencing the
>     exact same issue I had before and managed to work around it using
>     the same procedure.
>
>     As it turns out, the problem can be avoided when mounting the card
>     on a live VM, which somehow bypasses the ROM check and runs
>     without a hitch on my machine.
>
>     This has me believe that it might be a problem that's specific to
>     Asus cards (and perhaps only their Nvidia cards or even just their
>     Maxwell cards), but I'm not sure where the problem could possibly
>     come from and whether it's related to vfio itself or if it's an
>     issue on Asus' end. Either way, I'd need help from someone who's
>     more familiar with vfio than I am to help diagnose the issue, or
>     possibly someone else with the same isue so we could evaluate how
>     widespread the problem actually is.
>
>     _______________________________________________
>     vfio-users mailing list
>     vfio-users at redhat.com <mailto:vfio-users at redhat.com>
>     https://www.redhat.com/mailman/listinfo/vfio-users
>
Hi, sorry if it took a while to reply.

I've just tried running the VM with nouveau blacklisted (I normally use 
nouveau to drive a GT210 for my host, I haven't installed the blob on it 
since my last clean install) and after removing the module from my 
kernel image so it doesn't get loaded at boot. As of now, none of the 
screens connected to the computer turn on (the VM screen stays frozen on 
the gummiboot menu once Linux starts, but that's normal), and lspci 
reports this :

    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)

         Subsystem: ASUSTeK Computer Inc. Device [1043:8508]

         Kernel driver in use: vfio-pci

         Kernel modules: nouveau

    01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)

         Subsystem: ASUSTeK Computer Inc. Device [1043:8508]

         Kernel driver in use: vfio-pci

         Kernel modules: snd_hda_intel

    06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218 [GeForce G210] [10de:0a60] (rev a2)

         Subsystem: PC Partner Limited / Sapphire Technology Device [174b:2180]

         Kernel modules: nouveau

    06:00.1 Audio device [0403]: NVIDIA Corporation High Definition Audio Controller [10de:0be3] (rev a1)

         Subsystem: PC Partner Limited / Sapphire Technology Device [174b:2180]

         Kernel driver in use: snd_hda_intel

         Kernel modules: snd_hda_intel

Nevertheless, this is what I get when I try booting up my VM:

    [   92.552660] tun: Universal TUN/TAP device driver, 1.6

    [   92.552664] tun: (C) 1999-2004 Max Krasnyansky <maxk at qualcomm.com>

    [   92.609290] device vnet0 entered promiscuous mode

    [   92.622575] br0: port 2(vnet0) entered forwarding state

    [   92.622586] br0: port 2(vnet0) entered forwarding state

    [   93.552472] vfio-pci 0000:01:00.1: enabling device (0000 -> 0002)

    [   93.569216] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1e at 0x258

    [   93.569226] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19 at 0x900

    [   95.396073] vfio-pci 0000:01:00.0: Invalid ROM contents

    [   95.396161] vfio-pci 0000:01:00.0: Invalid ROM contents

    [   97.021048] kvm: zapping shadow pages for mmio generation wraparound

    [   97.063786] kvm: zapping shadow pages for mmio generation wraparound

So that doesn't seem to be the problem.



On 2016-02-16 14:56, Hristo Iliev wrote:
> On 16.02.2016, at 20:24, Nicolas Roy-Renaud 
> <nicolas.roy-renaud.1 at ens.etsmtl.ca 
> <mailto:nicolas.roy-renaud.1 at ens.etsmtl.ca>> wrote:
>>
>> I've been wanting to report this one for a while, but I was hoping I 
>> could gather some more information first to have a clear idea of 
>> what's wrong. Hopefully some people on the mailing list will be able 
>> to help us solve this.
>>
>> The guest GPU I'm using on my current VFIO machine is an Asus-made 
>> GTX 970 (GTX970-DC2OC-4GD5). Although the BIOS is supposed to support 
>> UEFI (none of those on TechPowerUp do, but running GPU-Z on a 
>> base-metal Windows install proves mine actually does), whenever I try 
>> to perform a firmare dump from linux with the vfio driver or boot a 
>> VM that should claim this card, I get an "|Invalid ROM contents|" 
>> error. Last month, someone else came here asking about a similar 
>> issue, and I realized after asking a few questions that he was trying 
>> to pass an Asus-made GTX 750 (GTX750TI-PH-2GD5). As it turned out, he 
>> was experiencing the exact same issue I had before and managed to 
>> work around it using the same procedure.
>>
>> As it turns out, the problem can be avoided when mounting the card on 
>> a live VM, which somehow bypasses the ROM check and runs without a 
>> hitch on my machine.
>>
>> This has me believe that it might be a problem that's specific to 
>> Asus cards (and perhaps only their Nvidia cards or even just their 
>> Maxwell cards), but I'm not sure where the problem could possibly 
>> come from and whether it's related to vfio itself or if it's an issue 
>> on Asus' end. Either way, I'd need help from someone who's more 
>> familiar with vfio than I am to help diagnose the issue, or possibly 
>> someone else with the same isue so we could evaluate how widespread 
>> the problem actually is.
>
> Hi Nicolas,
>
> My GPU is the the same as yours - ASUS STRIX GTX970-DC2OC-4GD5. I’ve 
> never experienced the problem you are describing. The card has been 
> working flawlessly with OVMF in both my old Win 8.1 VM and then with 
> my newer Win 10 VM. Both VMs are vanilla virt-manager creations, i.e. 
> no fancy VGA ROM files or special command-line parameters for qemu.
>
> Could it be due to e.g. a different GPU BIOS version? Have you 
> reflashed/upgraded yours?
>
> Cheers,
> Hristo
>
>> _______________________________________________
>> vfio-users mailing list
>> vfio-users at redhat.com <mailto:vfio-users at redhat.com>
>> https://www.redhat.com/mailman/listinfo/vfio-users
>

That's extremely curious, might I ask what distro you're running on? I 
was using stock libvirt and qemu from the arch repos, then I read 
somewhere that a recent patch for qemu allowed you to pass it an 
argument that would spoof hypervisor information so that the nvidia 
drivers wouldn't notice you're using one, but don't have to disable any 
of the timers and other performance improvements so I switched to git. 
Those don't seem related, though, the "Invalid ROM contents" error seems 
to come from vfio-pci. The firmware I'm using is the stock one that came 
with my card, but it doesn't seem to match any of those on techpowerup. 
I'll be attaching it here in case someone wants to diff it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160219/bc6ff9f4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ASUS-STRIX-970-STOCK-VGA.rom
Type: application/octet-stream
Size: 201216 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160219/bc6ff9f4/attachment.obj>


More information about the vfio-users mailing list