[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Eternal 'good file hashes' list

On Tue, 20 Oct 2009, Ralf Ertzinger wrote:


On Tue, 20 Oct 2009 10:20:17 +0200, Tomas Mraz wrote:

What would this be good for?

To expand on the motivation for this:

The idea is to have a list of known good file hashes to test your local
files against, if you have reason not to trust your local RPM database
(which may have been compromised as well). The way I'd do that right now
would involve getting the corrensponding RPM files from a mirror (hoping
there still is a mirror for that) and then... well, then it gets a bit
fuzzy as I don't know how to check the checksum of a file against the
metadata in a RPM file but I'm sure it can be done somehow.

So I thought that there may be an easier way to do this, so I'm asking,
in a first step, for an estimate of the data size we're talking about,
as I have no idea how much metadata each file in an RPM takes up, how
many RPMs/files koji builds each day on average and so on.

Well, for example the file hashes (which are simply arrays of hex-strings in headers) of the 2871 packages on Fedora-11-x86_64-DVD.iso:
[pmatilai localhost Packages]$ rpm -qap --qf "[%{filedigests}\n]" *.rpm |wc
 430716  373388 24141084

To make any use of that data you'll obviously need the file names too, so:
[pmatilai localhost Packages]$ rpm -qap --qf "[%{filedigests} %{filenames}\n]" *.rpm |wc
 430716  804104 47467960

...but rpm stores paths indexed by directory, storing flat paths as returned by %{filenames} wastes tonnes of space. Also note that directories and symlinks dont have associated hashes. And of course there's a whole lot more metadata that you need to take into account when verifying: user+group, permissions, symlink targets etc.

	- Panu -

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]