[dm-devel] Ideas to reuse filesystem's checksum to enhance dm-raid1/10/5/6?

Austin S. Hemmelgarn ahferroin7 at gmail.com
Fri Nov 17 12:22:08 UTC 2017


On 2017-11-16 20:30, Qu Wenruo wrote:
> 
> 
> On 2017年11月17日 00:47, Austin S. Hemmelgarn wrote:
> 
>>>
>>> This is at least less complicated than dm-integrity.
>>>
>>> Just a new hook for READ bio. And it can start from easy part.
>>> Like starting from dm-raid1 and other fs support.
>> It's less complicated for end users (in theory, but cryptsetup devs are
>> working on that for dm-integrity), but significantly more complicated
>> for developers.
>>
>> It also brings up the question of what happens when you want some other
>> layer between the filesystem and the MD/DM RAID layer (say, running
>> bcache or dm-cache on top of the RAID array).  In the case of
>> dm-integrity, that's not an issue because dm-integrity is entirely
>> self-contained, it doesn't depend on other layers beyond the standard
>> block interface.
> 
> Each layer can choose to drop the support for extra verification.
> 
> If the layer is not modifying the data, it can pass it do lower layer.
> Just as integrity payload.
Which then makes things a bit more complicated in every other layer as 
well, in turn making things more complicated for all developers.
> 
>>
>> As I mentioned in my other reply on this thread, running with
>> dm-integrity _below_ the RAID layer instead of on top of it will provide
>> the same net effect, and in fact provide a stronger guarantee than what
>> you are proposing (because dm-integrity does real cryptographic
>> integrity verification, as opposed to just checking for bit-rot).
> 
> Although with more CPU usage for each device even they are containing
> same data.
I never said it wasn't higher resource usage.
> 
>>>
>>>>
>>>> If your checksum is calculated and checked at FS level there is no added
>>>> value when you spread this logic to other layers.
>>>
>>> That's why I'm moving the checking part to lower level, to make more
>>> value from the checksum.
>>>
>>>>
>>>> dm-integrity adds basic 'check-summing' to any filesystem without the
>>>> need to modify fs itself
>>>
>>> Well, despite the fact that modern filesystem has already implemented
>>> their metadata csum.
>>>
>>>    - the paid price is - if there is bug between
>>>> passing data from  'fs' to dm-integrity'  it cannot be captured.
>>>>
>>>> Advantage of having separated 'fs' and 'block' layer is in its
>>>> separation and simplicity at each level.
>>>
>>> Totally agreed on this.
>>>
>>> But the idea here should not bring that large impact (compared to big
>>> things like ZFS/Btrfs).
>>>
>>> 1) It only affect READ bio
>>> 2) Every dm target can choose if to support or pass down the hook.
>>>      no mean to support it for RAID0 for example.
>>>      And for complex raid like RAID5/6 no need to support it from the very
>>>      beginning.
>>> 3) Main part of the functionality is already implemented
>>>      The core complexity contains 2 parts:
>>>      a) checksum calculation and checking
>>>         Modern fs is already doing this, at least for metadata.
>>>      b) recovery
>>>         dm targets already have this implemented for supported raid
>>>         profile.
>>>      All these are already implemented, just moving them to different
>>>      timing is not bringing such big modification IIRC.
>>>>
>>>> If you want integrated solution - you are simply looking for btrfs where
>>>> multiple layers are integrated together.
>>>
>>> If with such verification hook (along with something extra to handle
>>> scrub), btrfs chunk mapping can be re-implemented with device-mapper:
>>>
>>> In fact btrfs logical space is just a dm-linear device, and each chunk
>>> can be implemented by its corresponding dm-* module like:
>>>
>>> dm-linear:       | btrfs chunk 1 | btrfs chunk 2 | ... | btrfs chunk n |
>>> and
>>> btrfs chunk 1: metadata, using dm-raid1 on diskA and diskB
>>> btrfs chunk 2: data, using dm-raid0 on disk A B C D
>>> ...
>>> btrfs chunk n: system, using dm-raid1 on disk A B
>>>
>>> At least btrfs can take the advantage of the simplicity of separate
>>> layers.
>>>
>>> And other filesystem can get a little higher chance to recover its
>>> metadata if built on dm-raid.
>> Again, just put dm-integrity below dm-raid.  The other filesystems
>> primarily have metadata checksums to catch data corruption, not repair
>> it,
> 
> Because they have no extra copy.
> If they have, they will definitely use the extra copy to repair.
But they don't have those extra copies now, so that really becomes 
irrelevant as an argument (especially since it's not likely they will 
add data or metadata replication in the filesystem any time in the near 
future).
> 
>> and I severely doubt that you will manage to convince developers to
>> add support in their filesystem (especially XFS) because:
>> 1. It's a layering violation (yes, I know BTRFS is too, but that's a bit
>> less of an issue because it's a completely self-contained layering
>> violation, while this isn't).
> 
> If passing something along with bio is violating layers, then integrity
> payload is already doing this for a long time.
The block integrity layer is also interfacing directly with hardware and 
_needs_ to pass that data down.  Unless I'm mistaken, it also doesn't do 
any verification except in the filesystem layer, and doesn't pass down 
any complaints about the integrity of the data (it may try to re-read 
it, but that's not the same as what you're talking about).
> 
>> 2. There's no precedent in hardware (I challenge you to find a block
>> device that lets you respond to a read completing with 'Hey, this data
>> is bogus, give me the real data!').
>> 3. You can get the same net effect with a higher guarantee of security
>> using dm-integrity.
> 
> With more CPU and IO overhead (journal mode will write data twice, one
> for journal and one for real data).
If you're concerned about that, then the same argument could be made 
about having checksumming at all.  Yes, it's not cheap, but security and 
data safety almost never are.  CoW semantics in BTRFS are just as 
resource intensive (if not more so).




More information about the dm-devel mailing list