[dm-devel] [PATCH v3] dm-crypt: fix deadlock when swapping to encrypted device
Mikulas Patocka
mpatocka at redhat.com
Fri Jan 8 17:07:47 UTC 2021
Hi
On Mon, 30 Nov 2020, John Dorminy wrote:
> I am the first to admit I don't understand all the subtleties of the
> block layer, but I'm worried about a deadlock in close analogy to that
> described in bio_alloc_bioset.
> https://elixir.bootlin.com/linux/latest/source/block/bio.c#L463
> Hopefully ya'll can clear up my understanding...
>
> Specifically, if a segment of a target blocks in its map function, I
> believe that risks deadlock due to memory congestion. Such a action
> blocks the rest of the IO being submitted by the current thread, and
> the other IO currently being submitted may potentially be to other
> segments which may make progress and free memory.
All the request-based drivers block in their map function if there's more
than 128 requests (it is tunable in /sys/block/*/queue/nr_requests).
So, many of the device mapper targets block as well - for example, the
linear target just passes a bio down to an underlying device - so it will
block if the underlying device has more than 128 requests in flight.
If you don't block, kswapd will just send another bio - and that is the
problem.
> As a example, suppose something submits a bio to a device; the device
> submits three child bios in its map / submit_bio function, some with
> their own data allocated, to different devices, which add up to all
> the reclaimable memory on the system. The first of these bios gets
> blocked by the target (perhaps because it needs memory, or is at
> capacity). Because submit_bio_noacct() traverses bio submittals
> depth-first, the other two child bios will not be submitted to their,
> different, devices until the first of these bios finishes being
> submitted; although were those other two bios allowed to make
> progress, they might complete and free memory. (Admittedly, they might
> need to make further memory allocations to make progress. But in
> theory the mempool / bio_set reserve should allow forward progress on
> these other bios, I think.).
>
> Even this limit only reduces, not eliminates, the problem. Consider a
> machine where fewer than 32768 bios exhausts the available memory;
> additional IOs will be accepted, but will be blocked by trying to
> allocate memory; that memory may be necessary to service the requests
> already in progress, causing a similar starvation of other
> memory-requiring work on the machine. It may be less likely on a
> well-tuned machine for there to be less memory than needed for
> dm-crypt to service 32768 IOs, but it's still possible.
>
> I think it would be safer to stash excess bios on a list for future
> work, and, when congestion support exists again (I don't think it
But if you don't block, you'll get more bios - so the list would grow
beyond limit.
And the excessive amount of bios in flight will reduce system
responsiveness.
> currently does, but I haven't kept good track), perhaps use that
> mechanism to signal when the device is at capacity. But I am probably
> being too paranoid and missing some subtlety above.
>
> Less major, and hopefully clearer, thoughts:
>
> dm-crypt already has a concept of the max number of pages allocatable
> for currently active IOs -- specifically DM_CRYPT_MEMORY_PERCENT per
> cpu. If you're trying to scale by amount of memory on the system,
> perhaps going off DM_CRYPT_MEMORY_PERCENT is safer. I'm somewhat
> baffled why that mechanism isn't enough for the observed problem, tbh.
Perhaps we could lower DM_CRYPT_MEMORY_PERCENT, perhaps we could remove it
and make a limit using a semaphore.
> Perhaps it would be better to add a maximum allocatable objects
> mechanism, if it's safe, to mempool, dm, or bioset. If it were in
> bio_alloc_bioset, it could take advantage of the same rescuer
> workqueue to keep from blocking, potentially.
dm-crypt is currently doing it - see "ret = mempool_init(&cc->page_pool,
BIO_MAX_PAGES, crypt_alloc_page, crypt_free_page, cc);"
> Your patch format doesn't work when I try to apply them via git apply
> patchfile for testing, while most patches do... Not sure if it's you
> or me, but it seems most patches from git send-email / git
> format-patch have some stats about inserts/deletes after the ---
> marker, which marker seems missing from your messages. (Also,
> traditionally, I think the changes between patch versions go after the
> --- marker, so they don't go in the change description of the commit.)
I use quilt, not git, to work on my patches.
> Thanks!
>
> John Dorminy
Mikulas
More information about the dm-devel
mailing list