[vfio-users] Bus reset trouble with Titan-X (was Re: Welcome to the "vfio-users" mailing list (Digest mode))
kvasko at gmail.com
Tue Oct 18 04:10:53 UTC 2016
Thanks. I'm an idiot. I just replied to the email directly after the
subscription and wasn't paying attention. Thank you for correcting it.
I was originally running 3.13.0-86-generic upgraded to the 3.19 version to
try before I posted this, but got the same results. I'll try a newer
version of the kernel and see what happens.
Sorry to be dense but what do you mean by "retrain properly"? I assume you
mean that once it fails to reset it just never recovers?
We have 2 other machines that I've never seen this problem with so what
what you are saying makes sense. This system does have a slightly more
specialized PCI bus to be able to stick 8 cards on a single bus (at least
that is my understanding), so at this point, either I'm hitting a bug that
is fixed in the kernel, or this PCI bus is not doing something that
vfio-pci is expecting (would be my speculation).
I'll report back my findings tomorrow.
Thanks for the help.
On Mon, Oct 17, 2016 at 5:53 PM, Alex Williamson <alex.williamson at redhat.com
> (generally a good idea to have a useful subject line)
> On Mon, 17 Oct 2016 16:26:15 -0500
> Kevin Vasko <kvasko at gmail.com> wrote:
> > Any suggestions on debugging a !!! Unknown header type 7f?
> This usually means that the device didn't come back from bus reset and
> re-reading the PCI config space where the device was just gives a -1
> response. lspci tries to interpret that bogus data and gives results
> like you see. You might try a newer kernel, we've probably fixed some
> things in the bus reset path since v3.19. It looks like you continue
> to see the bogus data once it gets into this state, so it's probably
> not a "simple" device coming out of reset too slowly problem. Possibly
> the PCIe link doesn't retrain properly sometimes after a bus reset. If
> a new kernel doesn't help, I could give you instructions for performing
> a bus reset with setpci and you could test how reliably you can reset
> the device and read config space after. Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the vfio-users