[vfio-users] Bus reset trouble with Titan-X (was Re: Welcome to the "vfio-users" mailing list (Digest mode))

Kevin Vasko kvasko at gmail.com
Tue Oct 18 04:10:53 UTC 2016

Thanks. I'm an idiot. I just replied to the email directly after the
subscription and wasn't paying attention. Thank you for correcting it.

I was originally running 3.13.0-86-generic upgraded to the 3.19 version to
try before I posted this, but got the same results. I'll try a newer
version of the kernel and see what happens.

Sorry to be dense but what do you mean by "retrain properly"? I assume you
mean that once it fails to reset it just never recovers?

We have 2 other machines that I've never seen this problem with so what
what you are saying makes sense. This system does have a slightly more
specialized PCI bus to be able to stick 8 cards on a single bus (at least
that is my understanding), so at this point, either I'm hitting a bug that
is fixed in the kernel, or this PCI bus is not doing something that
vfio-pci is expecting (would be my speculation).

I'll report back my findings tomorrow.

Thanks for the help.


On Mon, Oct 17, 2016 at 5:53 PM, Alex Williamson <alex.williamson at redhat.com
> wrote:

> (generally a good idea to have a useful subject line)
> On Mon, 17 Oct 2016 16:26:15 -0500
> Kevin Vasko <kvasko at gmail.com> wrote:
> >
> > Any suggestions on debugging a !!! Unknown header type 7f?
> >
> This usually means that the device didn't come back from bus reset and
> re-reading the PCI config space where the device was just gives a -1
> response.  lspci tries to interpret that bogus data and gives results
> like you see.  You might try a newer kernel, we've probably fixed some
> things in the bus reset path since v3.19.  It looks like you continue
> to see the bogus data once it gets into this state, so it's probably
> not a "simple" device coming out of reset too slowly problem.  Possibly
> the PCIe link doesn't retrain properly sometimes after a bus reset.  If
> a new kernel doesn't help, I could give you instructions for performing
> a bus reset with setpci and you could test how reliably you can reset
> the device and read config space after.  Thanks,
> Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20161017/105ab75b/attachment.htm>

More information about the vfio-users mailing list