[dm-devel] dm-crypt performance

Tue Mar 26 03:47:22 UTC 2013

Hi

I performed some dm-crypt performance tests as Mike suggested.

It turns out that unbound workqueue performance has improved somewhere 
between kernel 3.2 (when I made the dm-crypt patches) and 3.8, so the 
patches for hand-built dispatch are no longer needed.

For RAID-0 composed of two disks with total throughput 260MB/s, the 
unbound workqueue performs as well as the hand-built dispatch (both 
sustain the 260MB/s transfer rate).

For ramdisk, unbound workqueue performs better than hand-built dispatch 
(620MB/s vs 400MB/s). Unbound workqueue with the patch that Mike suggested 
(git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git) improves 
performance slighlty on ramdisk compared to 3.8 (700MB/s vs. 620MB/s).

However, there is still the problem with request ordering. Milan found out 
that under some circumstances parallel dm-crypt has worse performance than 
the previous dm-crypt code. I found out that this is not caused by 
deficiencies in the code that distributes work to individual processors. 
Performance drop is caused by the fact that distributing write bios to 
multiple processors causes the encryption to finish out of order and the 
I/O scheduler is unable to merge these out-of-order bios.

The deadline and noop schedulers perform better (only 50% slowdown 
compared to old dm-crypt), CFQ performs very badly (8 times slowdown).

If I sort the requests in dm-crypt to come out in the same order as they 
were received, there is no longer any slowdown, the new crypt performs as 
well as the old crypt, but the last time I submitted the patches, people 
objected to sorting requests in dm-crypt, saying that the I/O scheduler 
should sort them. But it doesn't. This problem still persists in the 
current kernels.

For best performance we could use the unbound workqueue implementation 
with request sorting, if people don't object to the request sorting being 
done in dm-crypt.

Mikulas