[dm-devel] snapshot_ctr kcopy memory allocation problem and following kernel madness

Mon Jan 19 14:01:01 UTC 2004

Am So, den 18.01.2004 schrieb Christophe Saout um 21:11:

> I think I finally found out what killed my webserver the last time.
> I've installed a watchdog and now I have something in my log:
> 
> Jan 18 20:40:24 websrv lvcreate: page allocation failure. order:0, mode:0xd0
> Jan 18 20:40:24 websrv Call Trace:
> Jan 18 20:40:24 websrv [<c014082e>] __alloc_pages+0x2ee/0x350
> Jan 18 20:40:24 websrv [<c02d5261>] client_alloc_pages+0x31/0x80
> Jan 18 20:40:24 websrv [<c02d5c85>] kcopyd_client_create+0x55/0xb0

The problem seems to be that dm-ioctl-v4.c sets the PF_MEMALLOC flag for
the current process.

Lookaing at the memory allocator (__alloc_page) this means that the VM
will think the memory allocation is already running (and this is a
recursion) so it will not try to free pages / rebalance page or
whatever. It will just dig into the reserves. Well, so when kcopyd tries
to allocate 1 Meg of memory the reservers will somehow exhaust though
there is memory (cache or something) that could simply be freed without
doing I/O.

In the PF_MEMALLOC case the memory allocator even seems to ignore
__GFP_REPEAT and __GFP_NORETRY. So this is kind of a dangerous thing to
do.

The whole thing seems to be about avoiding I/O since devices could be
suspended. Using PF_MEMALLOC seems bogus to me, shouldn't we use
GFP_NOIO instead of GFP_KERNEL instead when allocating memory? This
should ensure dm never recurses into itself when allocating memory.

As long as we don't need a lot of memory at once, PF_MEMALLOC seems to
work since there reserves are big enough (min_free_kbytes is 1024) most
of the time except for also doing buffer allocation.

The other solution I could think of is to not allocate buffers directly
but to delay allocation, e.g. when they are required for the first time.
But this sounds rather ugly but I don't think we always need 1 Meg of
buffers around either. And for the mempool allocation in dm-crypt
changing this would be a problem (though I only allocate a reserve of 32
pages at the moment).

Thoughts?

Another thing: As seen in the log LVM is somehow unable to restore the
original mapping for the failed snapshot origin. The processes accessing
the dm device are then somehow unkillable. When I then remove the device
using dmsetup there is no way to unlock the process at all. -> Reboot.
That's not very nice. I haven't investigated what could be done, I just
rebooted the machine. I'm so stupid. :)