[vfio-users] Bus reset trouble with Titan-X

Kevin Vasko kvasko at gmail.com
Wed Oct 19 16:46:21 UTC 2016


Alex,

Thanks, but no luck.

I ran :

#:setpci -s 3:00.0 82.w=8:8

checked

#:lspci -vvvs 3:00.0

MRL- was the same.

#: setpci -s 3:00.0 78.w=20:20

checked:

#: lspci -vvs 3:00.0

MRL- was the same


LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt-
ABWMgmt-

SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
           Changed: MRL- PresDet+ LinkState-

Just for my own knowledge what does "retrain" mean? I assume resetting the
bus and it reconnecting successfully?

Thanks again,

-Kevin

On Wed, Oct 19, 2016 at 10:50 AM, Alex Williamson <
alex.williamson at redhat.com> wrote:

> On Wed, 19 Oct 2016 10:00:57 -0500
> Kevin Vasko <kvasko at gmail.com> wrote:
>
> > Sure thing. I'm attaching all of the logs I have to let you get a bigger
> > picture (and anyone that might run into a similar issue). Hopefully I
> > didn't mess anything up.
> >
> ...
>
> Here's the bit I was curious about:
>
> > #showing parent bridge of a device that has a failed
> > #:lspci -vvvs 03:00
> > 03:00.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00
> > [Normal decode])
> ...
> > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency
> > L0s <4us, L1 <8us
> > ClockPM- Surprise- LLActRep- BwNot-
> > LnkCtl: ASPM Disabled; Disabled- CommClk-
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt-
> > ABWMgmt-
>
>
> The Link Status shows that it's in Gen1 mode at x0 width, so the link
> failed to return to a working state after bus reset.  Maybe a hint is
> that the Slot Status register shows that the Presence Detect Changed bit
> got flipped, but the Presence Detect State bit remains 1, indicating
> that a card is present.  However Presence Detect Changed Enable is not
> set in the Slot Control register, so the OS doesn't get notified about
> this.
>
> I wonder what would happen if we cleared the Presence Detect Changed
> bit and tried to retrain the link.  The express capability is at 0x68,
> the slot status register is at 0x1a, bit 3 is the presence detect
> changed bit and it's RW1C (read, write 1 to clear).  Therefore to clear
> the bit we could do:
>
> setpci -s 3:00.0 82.w=8:8
>
> Recheck with lspci -vvvs 3:00.0 to check whether
>
> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
>         Changed: MRL- PresDet+ LinkState-
>                       ^^^^^^^^
>
> Still reports + or - and possible if the link has decided to retrain.
> To force a retrain we need to poke bit 5 in the link control register,
> offset 0x10:
>
> setpci -s 3:00.0 78.w=20:20
>
> Recheck lspci to see if there's any progress.
>
> ...
> > #showing parent device that has a NON failed device
> > #: lspci -vvvs 03:08
> > 03:08.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00
> > [Normal decode])
> ...
> > LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency
> > L0s <4us, L1 <8us
> > ClockPM- Surprise- LLActRep- BwNot-
> > LnkCtl: ASPM Disabled; Disabled- CommClk-
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt-
> > ABWMgmt-
>
> In this case the link has retrained to Gen3 x16 and of course the
> downstream devices are accessible.  The Presence Detect Changed bit is
> set to - on this port.  Thanks,
>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20161019/6695a176/attachment.htm>


More information about the vfio-users mailing list