[dm-devel] dm-crypt optimization

Tue Dec 20 09:41:16 UTC 2016

At a high level the goal is to maximize the size of data blocks that get passed
to hardware accelerators, minimizing the overhead from setting up and tearing
down operations in the hardware. Currently dm-crypt itself is a big blocker as
it manually implements ESSIV and similar algorithms which allow per-block
encryption of the data so the low level operations from the crypto API can
only operate on a single block. This is done because currently the crypto API
doesn't have software implementations of these algorithms itself so dm-crypt
can't rely on it being able to provide the functionality. The plan to address
this was to provide some software implementations in the crypto API, then
update dm-crypt to rely on those. Even for a pure software implementation
with no hardware acceleration that should hopefully provide a small
optimization as we need to call into the crypto API less often but it's likely
to be marginal given the overhead of crypto, the real win would be on a system
that has an accelerator that can replace the software implementation.

Currently dm-crypt handles data only in single blocks. This means that it can't
make good use of hardware cryptography engines since there is an overhead to
each transaction with the engine but transfers must be split into block sized
chunks. Allowing the transfer of larger blocks e.g. 'struct bio', could
mitigate against these costs and could improve performance in operating systems
with encrypted filesystems. Although qualcomm chipsets support another variant
of the device-mapper dm-req-crypt, it is not something generic and in
mainline-able state. Also, it only supports 'XTS-AES' mode of encryption and
is not compatible with other modes supported by dm-crypt.

However, there are some challenges and a few possibilities to address this. I
request you to provide your suggestions on whether the points mentioned below
makes sense and if it could be done differently.

1. Move the 'real' IV generation algorithms to crypto layer (e.g. essiv)
2. Increase the 'length' of the scatterlist nodes used in the crypto api. It
   can be made equal to the size of a main memory segment (as defined in
   'struct bio') as they are physcially contiguous.
3. Multiple segments in 'struct bio' can be represented as scatterlist of all
   segments in a 'struct bio'.

4. Move algorithms 'lmk' and 'tcw' (which are IV combined with hacks to the
   cbc mode) to create a customized cbc algorithm, implemented in a seperate
   file (e.g. cbc_lmk.c/cbc_tcw.c). As Milan suggested, these can't be treated
   as real IVs as these include hacks to the cbc mode (and directly manipulate
   encrypted data).

5. Move key selection logic to user space or always assume keycount as '1'
   (as mentioned in the dm-crypt param format below) so that the key selection
   logic does not have to be dependent on the sector number. This is necessary
   as the key is selected otherwise based on sector number:

   key_index = sector & (key_count - 1)

   If block size for scatterlist nodes are increased beyond sector boundary
   (which is what we plan to achieve, for performance), the key set for every
   cipher operation cannot be changed at the sector level.

   dm-crypt param format : cipher[:keycount]-mode-iv:ivopts
   Example               : aes:2-cbc-essiv:sha256

   Also as Milan suggested, it is not wise to move the key selection logic to
   the crypto layer as it will prevent any changes to the key structure later.

The following is a reference to an earlier patchset. It had the cipher mode
'cbc' mixed up with the IV algorithms and is usually not the preferred way.

Reference:
https://lkml.org/lkml/2016/12/13/65
https://lkml.org/lkml/2016/12/13/66