[dm-devel] [4.4, 4.5, 4.6] Regression: encrypted swap (dm-crypt) freezes system while under memory pressure and swapping

Ondrej Kozina okozina at redhat.com
Thu May 5 15:54:27 UTC 2016


On 04/21/2016 09:48 AM, Matthias Dahl wrote:
> Hello @all,
>
> first of all, I sent this exact msg also to the lkml a few days ago but
> since I received no reaction, I thought this list might be a better
> place for this problem -- or I might at least reach the right persons to
> get this fixed/debugged/... . :-)
>
> Recently I started seeing freezes while compiling bigger packages that
> do require lots of memory (I use Gentoo).
>
> The freezes where in the form that while in Xorg, the system would just
> completely hang -- no magic sysrq keys, no mouse movement, nothing.
> While in a terminal, one could still issue a magic sysrq command but it
> would only echo the command itself but not execute it -- except for the
> reboot command. So there was no way to get a backtrace or states or
> anything alike.
>
> After debugging this further, it became clear that the system always
> froze when it started hitting the encrypted swap. It worked absolutely
> fine as soon as you took the encryption out of the picture.
>
> My setup then was: A 8 GiB swap on S/W-RAID5 for my 8 GiB physical ram
> that was encrypted with dm-crypt and AES256-CBC-ESSIV.
>
> I debugged this further and changed my setup to several swap partitions
> on the physical disks w/o a RAID in-between to isolate the culprit. This
> made no difference -- neither did switching ciphers and so forth.
>
> Since this setup had worked for ages, I started looking into what had
> changed the weeks before and noticed I had done several kernel upgrades.
>
> To make a long story short, here my findings:
>
> 4.3.0, 4.4.0-final, 4.5-rc1 to 4.5-rc2:
> No problems, except for the usual sluggishness with encrypted swap that
> has been there since forever (it is like the encryption has the highest
> priority and takes over the system, e.g. no terminal input is accepted
> on a different terminal while high memory pressure is going on which is
> in contrast with the encrypted swap, where this still works fine).
>
> 4.4.x, >= 4.5-rc3 (incl. 4.6-rcX and master):
> The system freezes under memory pressure as soon as it starts swapping
> out. 4.6 master is an exception here, it still responds to magic sysrq
> commands properly but after some time though completely freezes hard.
>
> I hadn't had the time to test all 4.3.x and 4.4.x releases, I am afraid.
> What I can say though is that 4.4.6 is affected as well.
>
> A git bisect between 4.5-rc2 and 4.5-rc3, lead me to the following commit:
>
> 564e81a57f9788b1475127012e0fd44e9049e342 is the first bad commit
> commit 564e81a57f9788b1475127012e0fd44e9049e342
> Author: Tetsuo Handa <penguin-kernel at i-love.sakura.ne.jp>
> Date:   Fri Feb 5 15:36:30 2016 -0800
>
>      mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any
> progress
>
> This is obviously not the real culprit in my opinion but a trigger.
> Reverting that commit on 4.5.1 for example, makes the encrypted swap
> work flawlessly again (except for the usual system sluggishness).
>
> Reverting it on 4.6 master at c3b46c73264b03000d1e18b22f5caf63332547c9,
> does show a different picture though: The system freezes while the sysrq
> keys do still work and usually recovers after some while if the
> corresponding task that triggered the swapping in the first place, gets
> killed. It sometimes does a bit of swapping, and sometimes don't while
> it hangs there -- while usually with the other kernels in the "frozen"
> state, the swapping stops completely.
>
> I managed to get a bit more information out of 4.6 master though since
> it sometimes recovers after quite some time and I can copy backtraces
> and such to the disk, which I have attached.
>
> I hope this helps in finding the real issue behind this. I am sorry I
> could not provide more information but this has been a rather time
> consuming task thus far. :-)
>
> If there is anything else I can do to help or test, please let me know
> and I will gladly do so.
>
> Thanks in advance.
>
> So long,
> Matthias
>

Hello,

I second the observation that something is wrong and it doesn't seem to 
be related to dm-crypt target. My test setup is as follows:

2 CPUs
system ram: 1 GB
swap on top of dm-crypt: 2 GB

Whenever I start workload that consumes more memory than system ram but 
much less than total memory including the swap I end with following OOM 
message that I found to be premature and unexpected:
- 
https://okozina.fedorapeople.org/bugs/swap_on_dmcrypt/vmlog-1462458369-00000/sample-00011/dmesg

the important snippet in-before the oom:

active_anon:4096kB inactive_anon:4636kB, writeback:4636kB

and also:

Free swap  = 2039832kB
Total swap = 2097148kB

you can find more details in sample-* directories located in:
https://okozina.fedorapeople.org/bugs/swap_on_dmcrypt/vmlog-1462458369-00000/

each sample directory contains stats collection taken approximately each 
second after I started the workload (it's a script from 
http://linux-mm.org/OOM site).

For me OOM killer message can be observed starting with this commit:
commit f9054c70d28bc214b2857cf8db8269f4f45a5e23
Author: David Rientjes <rientjes at google.com>
Date:   Thu Mar 17 14:19:19 2016 -0700

     mm, mempool: only set __GFP_NOMEMALLOC if there are free elements

(...)

Kind regards
Ondrej




More information about the dm-devel mailing list