[vfio-users] VFIO and KSM (and maybe hugepages)

Colin Godsey crgodsey at gmail.com
Tue May 31 21:54:54 UTC 2016


Hmm It would have been a v4.4 build. Could very well just be coincidence I
haven’t seen it again.

Either way, I appreciate the info! I’ve been wary of trying THP/KSM for a
while because of that, but this renews my faith. The clarification about
THP is also a relief. There’s still so many articles/posts that list these
‘gotchas’ regarding VFIO (that have been mostly fixed), it is pretty easy
to go on unneeded witch hunts =\

Regarding KSM though… as far as i can tell (in the 4.4 kernel) KSM doesn’t
use the normal GUP style references (uses get_user_pages_fast internally):
https://github.com/torvalds/linux/blob/v4.4/mm/ksm.c#L887

I’m not horribly familiar with KSM itself, but from what I gathered of its
history, this may be a mechanism it uses to allow shared pages to still go
to swap etc.

Also afaik KSM, THP and kswapd (@ v4.4) all manipulate the mm differently,
with KSM and THP having their own unique problems dealing with each other.
This could also have been some kind of perfect storm, using basically all
of the memory-mapping technologies modern linux has, at the same time…

I’ll need to step through the entire pre and post 4.5 TLB changes and see
if I anything looks familiar to the current KSM mapping/ref-tracking.

Unfortunately I don’t have any other info at the time- I may try to run
some more isolated tests, but it was an intermittent issue that required a
few hours of gaming to flush out… so might take a bit there =\

On Tue, May 31, 2016 at 3:34 PM Alex Williamson <alex.williamson at redhat.com>
wrote:

> On Tue, 31 May 2016 20:20:58 +0000
> Colin Godsey <crgodsey at gmail.com> wrote:
>
> > I had a few questions regarding general ‘page management’ and VFIO,
> mostly
> > related to kernel shared pages.
> >
> > I have a host running 2 virtual ‘gaming rigs’ with a single dedicated GPU
> > each. I had an intermittent problem where when gaming (on the same game)
> > with both rigs, one would receive graphic artifacts. Specifically I would
> > see triangle/geometry artifacts which usually indicate corrupt GPU RAM.
> >
> > Both cards are so completely different, and different generation, one is
> > really new, I didn’t believe it was bad VRAM. Graphics drivers to swap
> > various buffers from system RAM to VRAM so I figured it could also be
> > something related to system RAM.
> >
> > I disabled any kind of… alternative page management I could- swap, KSM,
> > huge pages etc. and it did fix it. Because the issue only would effect
> one
> > machine, and I only observed it when the same game was running on both, I
> > assumed maybe it was related to KSM.
> >
> > *Is there any possible way KSM could interfere with the DMAR in some way
> > where it tries to share/alter DMA regions?* And broader: what prevents
> > systems like khugepage, kswap, and ksm from interfering with these
> regions
> > in the first place? I’ve read that transparent hugepages can interfere
> with
> > VFIO, is it safe to assume that other DMA issues could arise with other
> > types of page management?
>
> What kernel were you running where you saw this?  vfio uses
> get_user_pages to increase the reference count on pages mapped through
> the iommu.  This should prevent both ksm and transparent hugepages from
> being able to operate on the pages.  Kernel v4.5 had a bug (now fixed
> in v4.5.5) that did not honor the reference, allowing thp (maybe ksm
> too) to still operate on those pages.  So as long as you're not running
> v4.5.0 through v4.5.4 (or a v4.5-rc), I'm not aware of any issues with
> page pinning.  Thanks,
>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160531/00f24417/attachment.htm>


More information about the vfio-users mailing list