[libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Tue Nov 14 17:52:03 UTC 2017

Thanks for the reply Daniel,

However I think you slightly misunderstood the scenario...

On 14 November 2017 at 10:32, Daniel P. Berrange <berrange at redhat.com> wrote:
> IOW, if your application has a certain expectation of performance that can only
> be satisfied by having the KVM guest backed by huge pages, then you should
> really change to explicitly reserve huge pages for the guests, and not rely on
> THP which inherantly can't provide any guarantee in this area.

We already do this. The problem is not hugepage backing of the guest,
it is THP allocation inside the guest (or indeed on a bare-metal
host). The issue in the HPC world is that we support so many different
applications (some of which are complete black-boxes) that explicit
hugepage allocation for application memory is generally not viable, so
we are reliant on THP to avoid TLB thrashing.

> The kernel can't predict the future usage pattern of processes so it is not at
> all clear cut that evicting the entire pagecache in order to allocate more
> huge pages is going to be beneficial for system performance as a whole.

Yet the default behaviour seems to be to stall on fault, then directly
reclaim and defrag in order to allocate a hugepage if at all possible.
In my test-case there is almost no free memory, so some pagecache has
to be reclaimed for the process, I don't understand why the THP
allocation fails in this case versus when pagecache is lower though.

-- 
Cheers,
~Blairo