[dm-devel] [PATCH v3] dm-crypt: fix deadlock when swapping to encrypted device

Fri Jan 8 17:07:47 UTC 2021

Hi

On Mon, 30 Nov 2020, John Dorminy wrote:

> I am the first to admit I don't understand all the subtleties of the
> block layer, but I'm worried about a deadlock in close analogy to that
> described in bio_alloc_bioset.
> https://elixir.bootlin.com/linux/latest/source/block/bio.c#L463
> Hopefully ya'll can clear up my understanding...
> 
> Specifically, if a segment of a target blocks in its map function, I
> believe that risks deadlock due to memory congestion. Such a action
> blocks the rest of the IO being submitted by the current thread, and
> the other IO currently being submitted may potentially be to other
> segments which may make progress and free memory.

All the request-based drivers block in their map function if there's more 
than 128 requests (it is tunable in /sys/block/*/queue/nr_requests).

So, many of the device mapper targets block as well - for example, the 
linear target just passes a bio down to an underlying device - so it will 
block if the underlying device has more than 128 requests in flight.

If you don't block, kswapd will just send another bio - and that is the 
problem.

> As a example, suppose something submits a bio to a device; the device
> submits three child bios in its map / submit_bio function, some with
> their own data allocated, to different devices, which add up to all
> the reclaimable memory on the system. The first of these bios gets
> blocked by the target (perhaps because it needs memory, or is at
> capacity). Because submit_bio_noacct() traverses bio submittals
> depth-first, the other two child bios will not be submitted to their,
> different, devices until the first of these bios finishes being
> submitted; although were those other two bios allowed to make
> progress, they might complete and free memory. (Admittedly, they might
> need to make further memory allocations to make progress. But in
> theory the mempool / bio_set reserve should allow forward progress on
> these other bios, I think.).
> 
> Even this limit only reduces, not eliminates, the problem. Consider a
> machine where fewer than 32768 bios exhausts the available memory;
> additional IOs will be accepted, but will be blocked by trying to
> allocate memory; that memory may be necessary to service the requests
> already in progress, causing a similar starvation of other
> memory-requiring work on the machine. It may be less likely on a
> well-tuned machine for there to be less memory than needed for
> dm-crypt to service 32768 IOs, but it's still possible.
> 
> I think it would be safer to stash excess bios on a list for future
> work, and, when congestion support exists again (I don't think it

But if you don't block, you'll get more bios - so the list would grow 
beyond limit.

And the excessive amount of bios in flight will reduce system 
responsiveness.

> currently does, but I haven't kept good track), perhaps use that
> mechanism to signal when the device is at capacity. But I am probably
> being too paranoid and missing some subtlety above.
> 
> Less major, and hopefully clearer, thoughts:
> 
> dm-crypt already has a concept of the max number of pages allocatable
> for currently active IOs -- specifically DM_CRYPT_MEMORY_PERCENT per
> cpu. If you're trying to scale by amount of memory on the system,
> perhaps going off DM_CRYPT_MEMORY_PERCENT is safer. I'm somewhat
> baffled why that mechanism isn't enough for the observed problem, tbh.

Perhaps we could lower DM_CRYPT_MEMORY_PERCENT, perhaps we could remove it 
and make a limit using a semaphore.

> Perhaps it would be better to add a maximum allocatable objects
> mechanism, if it's safe, to mempool, dm, or bioset. If it were in
> bio_alloc_bioset, it could take advantage of the same rescuer
> workqueue to keep from blocking, potentially.

dm-crypt is currently doing it - see "ret = mempool_init(&cc->page_pool, 
BIO_MAX_PAGES, crypt_alloc_page, crypt_free_page, cc);"

> Your patch format doesn't work when I try to apply them via git apply
> patchfile for testing, while most patches do... Not sure if it's you
> or me, but it seems most patches from git send-email / git
> format-patch have some stats about inserts/deletes after the ---
> marker, which marker seems missing from your messages. (Also,
> traditionally, I think the changes between patch versions go after the
> --- marker, so they don't go in the change description of the commit.)

I use quilt, not git, to work on my patches.

> Thanks!
> 
> John Dorminy

Mikulas