[dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE
Mikulas Patocka
mpatocka at redhat.com
Wed Mar 21 18:23:40 UTC 2018
On Wed, 21 Mar 2018, Matthew Wilcox wrote:
> On Wed, Mar 21, 2018 at 12:39:33PM -0500, Christopher Lameter wrote:
> > One other thought: If you want to improve the behavior for large scale
> > objects allocated through kmalloc/kmemcache then we would certainly be
> > glad to entertain those ideas.
> >
> > F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not
> > allocate powers of two pages. It would be relatively easy to make
> > kmalloc_large round the allocation to the next page size and then allocate
> > N consecutive pages via alloc_pages_exact() and free the remainder unused
> > pages or some such thing.
alloc_pages_exact() has O(n*log n) complexity with respect to the number
of requested pages. It would have to be reworked and optimized if it were
to be used for the dm-bufio cache. (it could be optimized down to O(log n)
if it didn't split the compound page to a lot of separate pages, but split
it to a power-of-two clusters instead).
> I don't know if that's a good idea. That will contribute to fragmentation
> if the allocation is held onto for a short-to-medium length of time.
> If the allocation is for a very long period of time then those pages
> would have been unavailable anyway, but if the user of the tail pages
> holds them beyond the lifetime of the large allocation, then this is
> probably a bad tradeoff to make.
The problem with alloc_pages_exact() is that it exhausts all the
high-order pages and leaves many free low-order pages around. So you'll
end up in a system with a lot of free memory, but with all high-order
pages missing. As there would be a lot of free memory, the kswapd thread
would not be woken up to free some high-order pages.
I think that using slab with high order is better, because it at least
doesn't leave many low-order pages behind.
> I do see Mikulas' use case as interesting, I just don't know whether it's
> worth changing slab/slub to support it. At first blush, other than the
> sheer size of the allocations, it's a good fit.
All I need is to increase the order of a specific slab cache - I think
it's better to implement an interface that allows doing it than to
duplicate the slab cache code.
BTW. it could be possible to open the file
"/sys/kernel/slab/<cache>/order" from the dm-bufio kernel driver and write
the requested value there, but it seems very dirty. It would be better to
have a kernel interface for that.
Mikulas
More information about the dm-devel
mailing list