Duplicated files in the pristine FC4t2 installation

Peter Jones pjones at redhat.com
Mon May 2 20:25:57 UTC 2005


On Mon, 2005-05-02 at 12:35 -0700, Roland McGrath wrote:
> > Roland McGrath wrote:
> > > I think what one clearly wants is for rpm to maintain an installed file
> > > indexed keyed by md5sum.  Then you can have a tool that just uses this
> > > database to identify duplicates (and doesn't take forever), or have rpm do
> > > so itself when installing new files.
> > > 
> > 
> > Hmm, what about hash collisions, that would be really really BAD
> 
> If you are concerned about them you can still compare contents before
> declaring two files identical.  But using the hashes as the main detector
> makes it fast, since you only examine the data of files that are 99.999%
> likely to be identical.

And in the vast majority of cases, there's a simpler heuristic you can
use first: is the basename the same?

But really, this is 160MB of wasted space.  We don't support installing
onto USB, so from glancing at pricewatch, the smallest disk they list
that we support installing onto would appear to be an 18GB SCSI drive
for $23.  There are larger, cheaper drives, too.

So we're talking about saving just under 1% of the least-desirable
supported install target currently being sold.  Let's just stop?

-- 
        Peter




More information about the fedora-devel-list mailing list