[dm-devel] [PATCH] dm: remake of the verity target

Wed Mar 21 03:11:25 UTC 2012

On Tue, 20 Mar 2012, Mikulas Patocka wrote:

> > Changes:
> > 
> > * Salt is hashed before the block (it used to be hased after). The reason 
> > is that if random salt is hashed before the block, it makes the process 
> > resilient to hash function collisions - so you can safely use md5, even if 
> > there's a collision attach for it.
> 
> I am not aware of any additional benefit to prepending the salt versus
> appending. Could you please provide such a reference.
> 
> I would like to avoid breaking backward compatibility unless there is
> a real benefit.
> 
> Regards,
> Mandeep
> 

This is some deeper explanation why I do it this way. The reason is to 
protect agains collision attacks (such as a known attack on MD5 or 
possible future attacks against other hash functions).

"Preimage attack" means that you are given a hash value and you create a 
message that hashes to that hash value. There is no known preimage attack 
for currently used hash functions.

"Collision attack" means that you are able to create two messages that 
hash into the same hash value. There is currently collision attack known 
for MD5.

Suppose that I publish some software, calculate MD5 digest of it and sign 
that digest. This is safe (despite the existing collision attack on MD5) 
beacuse there is no preimage attack --- no one is able to create another 
file with the same MD5 hash.

However, it is still possible to break security with collision attack, but 
the attacker must be able to submit some of his data into the software 
signed with MD5.

Suppose for example that software developer publishes "real_program" and 
signs it with MD5. The attacker inserts some security backdoor into the 
program and gets "insecure_program". The attacker takes two MD5 states --- 
the state as it was after hashing "real_program" and the state as it was 
after hashing "insecure_program" --- and with collision attack, he is able 
to create two messages "m1" and "m2" such that they result in MD5 
collision.

The result is that
MD5("real_program"+"m1") and MD5("insecure_program"+"m2") hash to the same 
value.

Now, to make the attack successful, the attacker must trick the software 
developer somehow into inserting "m1" into his program.

It is not trivial, but possible to trick the software developer into 
inserting attacker-controlled data into the program. For example, the 
attacker can send him a file containing the string "m1" and claim that it 
is Chinese localization of the program --- if the software developer has 
no knowledge of Chinese writing system, he can't diferentiate a real 
Chinese text from a string of random characters --- so he inserts the file 
containing "m1" into his program and publishes it.

Now the attack is finished, the software developer published and signed 
"real_program"+"m1" and there exists another file "insecure_program"+"m2" 
that hashes into the same MD5 value. So the attacker can misrepresent 
"insecure_program"+"m2" as being real.

You can protect from this situation either by using a hash function 
without collision attack or by prepending some random data before the 
program to be hashed. If the developer signs "random_data"+"real_program" 
with MD5 in the above example, there is no way how the attacker can create 
a collision --- the attacker can still create "m1" and "m2" and trick the 
developer into including "m1" in the program --- but the developer uses 
different "random_data" next time he publishes a next version of the 
software, so there will be no MD5 collision.

This is a reason why I changed dm-verity hashing system. The salt is 
random-generated when creating the hashes. When we hash a data block, we 
prepend the salt to the data. Consequently, the attacker can't exploit the 
collision attack as described above. If we append the hash (as it used to 
be before), it would be possible to exploit the collision attack.

Mikulas