Eternal 'good file hashes' list

Panu Matilainen pmatilai at laiskiainen.org
Tue Oct 20 11:18:03 UTC 2009


On Tue, 20 Oct 2009, Ralf Ertzinger wrote:

> Hi.
>
> On Tue, 20 Oct 2009 10:20:17 +0200, Tomas Mraz wrote:
>
>> What would this be good for?
>
> To expand on the motivation for this:
>
> The idea is to have a list of known good file hashes to test your local
> files against, if you have reason not to trust your local RPM database
> (which may have been compromised as well). The way I'd do that right now
> would involve getting the corrensponding RPM files from a mirror (hoping
> there still is a mirror for that) and then... well, then it gets a bit
> fuzzy as I don't know how to check the checksum of a file against the
> metadata in a RPM file but I'm sure it can be done somehow.
>
> So I thought that there may be an easier way to do this, so I'm asking,
> in a first step, for an estimate of the data size we're talking about,
> as I have no idea how much metadata each file in an RPM takes up, how
> many RPMs/files koji builds each day on average and so on.

Well, for example the file hashes (which are simply arrays of 
hex-strings in headers) of the 2871 packages on Fedora-11-x86_64-DVD.iso:
[pmatilai at localhost Packages]$ rpm -qap --qf "[%{filedigests}\n]" *.rpm |wc
  430716  373388 24141084

To make any use of that data you'll obviously need the file names too, so:
[pmatilai at localhost Packages]$ rpm -qap --qf "[%{filedigests} %{filenames}\n]" *.rpm |wc
  430716  804104 47467960

...but rpm stores paths indexed by directory, storing flat paths as 
returned by %{filenames} wastes tonnes of space. Also note that 
directories and symlinks dont have associated hashes. And of course 
there's a whole lot more metadata that you need to take into account when 
verifying: user+group, permissions, symlink targets etc.

 	- Panu -




More information about the fedora-devel-list mailing list