[dm-devel] [RFC PATCH 1/1] dm crypt: change maximum sector size to PAGE_SIZE

Milan Broz gmazyland at gmail.com
Sat Nov 27 11:19:57 UTC 2021


On 11/25/21 17:28, Itai Handler wrote:
> On 14/11/2021 13:56, Itai Handler wrote:
>> On 11/11/2021 15:07, Milan Broz wrote:
>>> On 10/11/2021 18:43, Itai Handler wrote:
>>>> Maximum sector size of dm-crypt is currently limited to 4096 bytes.
>>>>
>>>> On systems where PAGE_SIZE is larger than 4096 bytes, using larger
>>>> sectors can be beneficial for performance reasons.
>>>
>>> The limit to 4096 was set because this is the smallest possible
>>> page size that all platform supports.
>>>
>>> If you allow a higher size here, the device cannot be activated on a
>>> platform
>>> with the smaller page size. (Encrypted sector size becomes
>>> atomic sector size for all upper layers - as you mention below, not
>>> all fs support bigger sectors.)
>>>
>>> For LUKS, this is not acceptable - the format is portable by definition.
>>>
>>> For specific dm-crypt device, I am not sure. I would better kept
>>> the 4096 page size limit here.
>>
>> I considered only plain dm-crypt since I am unfamiliar with LUKS.
>> Does LUKS assume that dm-crypt sector size is limited to 4K?
>> If so, maybe I'll be able to also patch LUKS regarding this issue.
>>
>>>
>>> It also depends on crypto API driver here (performance is usually
>>> optimized to 4k).
>>> What cipher and encryption mode did you use for test?
>>
>> The cipher I used for the test is not publicly available but I can say
>> that it's performance
>> is not optimized to 4k blocks.
>> I believe that this results from the high overhead of setting up DMA
>> transfers. (my
>> cipher uses DMA to transfer data between memory and programmable logic).
>> There are many additional ciphers that use DMA in the tree, but I
>> cannot run any
>> benchmark with them at the moment.
>>
>> I have performed some additional benchmarks using the ARM
>> Cryptographic Extensions
>> CPU ciphers and saw that increasing block size beyond 4K does increase
>> performance,
>> albeit the performance improvement isn't as large as I've seen when
>> using my cipher.
>>
>> Following are "tcrypt mode=600 sec=5 num_mb=512" results for
>> xts-aes-ce decryption
>> (ARM CPU Cryptographic Extensions cipher):
>>    testing speed of multibuffer xts(aes) (xts-aes-ce) decryption
>>    ...
>>    trcypt: test 5 (256 bit key, 4096 byte blocks): 801792 operations in
>> 5 seconds (3284140032 bytes)
>>    ...
>>    trcypt: test 9 (256 bit key, 65536 byte blocks): 63488 operations in
>> 5 seconds (4160749568 bytes)
>>
>> That translates to:
>>    657 MB/s for 4K byte blocks.
>>    832 MB/s for 64K blocks.
>>
>> That means that there is about 27 percents improvement when
>> transitioning to 64K sectors,
>> for the cipher alone (only tcrypt benchmark).
>>
>> This benchmark had been performed on an ARM Cortex A53 CPU.
>> (Note that in all of my benchmarks PAGE_SIZE=64K).
>>
>>> How the number looks for random access? Linear test is usually
>>> misleading.
>>> I expect there will be big performance problem if you write small
>>> data chunks,
>>> writes and encryption will be amplified to full big sectors here...)
>> I understand your concern.
>> However my patch does not force anyone to use large sectors - it only
>> opens up this
>> possibility for those interested in that option.
>> This is similarly to the option to format an ext4 filesystem with 64K
>> sectors.
>> By the way: when you do that, you get a warning saying that the
>> filesystem
>> will not be usable on most systems.
>>
>> Sometime users need to store mostly large files on a filesystem, for
>> example for
>> backup or for video files.
>> I think that in these cases random access time is not so important.
>> Some users may also be able to reserve a dedicated partition for
>> storing such
>> large files.
>>
>>>
>>> (Technical detail: such pat MUST increase dm-crypt minor version.)
>> Thanks for pointing that out. I believe that I can prepare a v2 patch
>> that will
>> address that issue.
>>>
>>> Milan
>>>
>>>>
>>>> This patch changes maximum sector size from 4096 bytes to PAGE_SIZE,
>>>> and in addition it changes the type of sector_size in
>>>> struct crypt_config from 'unsigned short int' to 'unsigned int', in
>>>> order to be able to represent larger values.
>>>>
>>>> On a prototype system which has PAGE_SIZE of 65536 bytes, I saw about
>>>> x2 performance improvement in sequential read throughput benchmark
>>>> while using only about half of the CPU usage, after simply increasing
>>>> sector size from 4096 to 65536 bytes.
>>>> I used ext4 filesystem for that benchmark, which supports 64KiB
>>>> sectors.
>>>>
>>>> Note: A small change should be made in cryptsetup in order to add
>>>> support for sectors larger than 4096 bytes.
>>>>
>>>> Signed-off-by: Itai Handler <itai.handler at gmail.com>
>>>> ---
>>>>    drivers/md/dm-crypt.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>>>> index 916b7da16de2..78c239443bd5 100644
>>>> --- a/drivers/md/dm-crypt.c
>>>> +++ b/drivers/md/dm-crypt.c
>>>> @@ -168,7 +168,7 @@ struct crypt_config {
>>>>           } iv_gen_private;
>>>>           u64 iv_offset;
>>>>           unsigned int iv_size;
>>>> -       unsigned short int sector_size;
>>>> +       unsigned int sector_size;
>>>>           unsigned char sector_shift;
>>>>
>>>>           union {
>>>> @@ -3115,9 +3115,9 @@ static int crypt_ctr_optional(struct dm_target
>>>> *ti, unsigned int argc, char **ar
>>>>                           cc->cipher_auth = kstrdup(sval, GFP_KERNEL);
>>>>                           if (!cc->cipher_auth)
>>>>                                   return -ENOMEM;
>>>> -               } else if (sscanf(opt_string, "sector_size:%hu%c",
>>>> &cc->sector_size, &dummy) == 1) {
>>>> +               } else if (sscanf(opt_string, "sector_size:%u%c",
>>>> &cc->sector_size, &dummy) == 1) {
>>>>                           if (cc->sector_size < (1 << SECTOR_SHIFT) ||
>>>> -                           cc->sector_size > 4096 ||
>>>> +                           cc->sector_size > PAGE_SIZE ||
>>>>                               (cc->sector_size & (cc->sector_size -
>>>> 1))) {
>>>>                                   ti->error = "Invalid feature value for
>>>> sector_size";
>>>>                                   return -EINVAL;
>>>>
>>>
>> I appreciate your valuable comments.
>>
>> Itai
>>
> Milan, can you comment on the above?

Hi,

well, if you want my opinion, let's summarize it:

- you tested it on a driver that is proprietary and is not in
the mainline kernel

- provided numbers are with tcrypt - I think this is
an internal crypto API test; it says nothing about performance
with your dm-crypt patch above it, right?

- no numbers for random access/small file access here (note,
if you need to reencrypt 64k sector vs. 4k of mostly unused data,
it WILL have a huge performance impact).

IMO, optimizing for big linear access is better on the fs layer
(and even years ago, I noticed some vendor's patches for setting
large fs sectors in video recording equipment).

The same applies to encryption - I think FDE is not the best layer
here if you need to work with large sectors (encryption of not-used
sectors is just huge waste of resources - fs encrypts only used space,
I hope... :).

I said that LUKS must remain multiplatform. Adding larger than
4k sectors is not an option; I will undoubtedly reject that patch
in cryptsetup.
So this patch can be used only for plain dm-crypt mapping,
where you need to handle your key management (that could be ok, though).

Sorry, but without any performance numbers that prove this really
helps and does not create huge performance problems...

NACK from me for this patch (but it is up to Mike to decide here).

Milan




More information about the dm-devel mailing list