[dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

Mikulas Patocka mpatocka at redhat.com
Wed Mar 21 18:23:40 UTC 2018



On Wed, 21 Mar 2018, Matthew Wilcox wrote:

> On Wed, Mar 21, 2018 at 12:39:33PM -0500, Christopher Lameter wrote:
> > One other thought: If you want to improve the behavior for large scale
> > objects allocated through kmalloc/kmemcache then we would certainly be
> > glad to entertain those ideas.
> > 
> > F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not
> > allocate powers of two pages. It would be relatively easy to make
> > kmalloc_large round the allocation to the next page size and then allocate
> > N consecutive pages via alloc_pages_exact() and free the remainder unused
> > pages or some such thing.

alloc_pages_exact() has O(n*log n) complexity with respect to the number 
of requested pages. It would have to be reworked and optimized if it were 
to be used for the dm-bufio cache. (it could be optimized down to O(log n) 
if it didn't split the compound page to a lot of separate pages, but split 
it to a power-of-two clusters instead).

> I don't know if that's a good idea.  That will contribute to fragmentation
> if the allocation is held onto for a short-to-medium length of time.
> If the allocation is for a very long period of time then those pages
> would have been unavailable anyway, but if the user of the tail pages
> holds them beyond the lifetime of the large allocation, then this is
> probably a bad tradeoff to make.

The problem with alloc_pages_exact() is that it exhausts all the 
high-order pages and leaves many free low-order pages around. So you'll 
end up in a system with a lot of free memory, but with all high-order 
pages missing. As there would be a lot of free memory, the kswapd thread 
would not be woken up to free some high-order pages.

I think that using slab with high order is better, because it at least 
doesn't leave many low-order pages behind.

> I do see Mikulas' use case as interesting, I just don't know whether it's
> worth changing slab/slub to support it.  At first blush, other than the
> sheer size of the allocations, it's a good fit.

All I need is to increase the order of a specific slab cache - I think 
it's better to implement an interface that allows doing it than to 
duplicate the slab cache code.

BTW. it could be possible to open the file 
"/sys/kernel/slab/<cache>/order" from the dm-bufio kernel driver and write 
the requested value there, but it seems very dirty. It would be better to 
have a kernel interface for that.

Mikulas




More information about the dm-devel mailing list