Request for Comments: updating RPMs using binary deltas.

seth vidal skvidal at phy.duke.edu
Thu Jan 8 16:15:46 UTC 2004


> What I am proposing:
> 
> 1.)	Use rsync or something similar to generate an incremental backup of the 
> patched, unpacked RPM, versus the original distributed RPM (this is not well 
> known how to do this, but rsync is capable of just copying files that have 
> changed: see the O'Reilly book 'Linux Server Hacks' hack #42).  The delta 
> must use the original, pristine,  as-distributed, RPM as the baseline for 
> this to not become unwieldy.  We would want the file deltas instead of the 
> whole changed files that an rsync incremental backup would provide.

This will beat up mirror/repository servers pretty badly.

What 'original distributed RPM'? There could be hundreds of iterations.
You'd need a pile of these files.





> 4.)	The pristine RPM is then 'patched' by the rpmdiff on a file by file level.  
> The headers from the rpmdiff are used to build the resulting complete RPM 
> which should be identical with the full errata RPM that would have been 
> downloaded (except the signature, since the rpmdiff would be signed (and 
> checked by teh update tools), whereas the reconstructed full rpm would not be 
> signed, unless the full RPM's signature is transmitted as part of the 
> rpmdiff).  The key to saving space and bandwidth is the use of file-by-file 
> deltas.

How do you establish these 'pristine rpms'?

What if I've installed a local rpm of the same package name and I want
to update. The concept of 'pristine' is gonna bite you.

And if you can't guarantee that then the mirrors are still going to have
to carry all this data and you still lose.


> 5.)	The resulting reconstructed errata RPM is then installed normally, using 
> up2date or whatever.
> 
> 6.)	There would need to be both a command line 'rpmdiff' tool as well as 
> up2date support for this to work.  Unless the support could be rolled into 
> rpmlib itself.
> 
> 7.)	The user then enjoys being able to download the updates over dialup and 
> having a chance to finish in less than five hours, in the case of the 
> kernel-source update.  Dialup users still exist, and they are not going away.  
> The minor inconvenience of having to insert their original CD's, or have all 
> the original RPMs stored someplace, IMO outweighs the much larger 
> inconvenience and cost of the bandwidth problems.  If the updates are large 
> enough one could get compromised in the time it takes to download them.  Even 
> with a high-bandwidth pipe, like a T1 or good DSL, the larger updates, 
> especially during the testing phases, can take hours to download.  I remember 
> getting beta updates through up2date that took a very long time to download 
> (and even longer to install, since virtually every package in the whole 
> distribution had changed, many of which by very few bytes).

So you've got a dialup user who will have to go through N steps in order
to get updates that they may or may not need?


> 8.)	The updates repository enjoys being able to service many more users per 
> hour, since each user takes less time and less bandwidth.  And hundreds of GB 
> are no longer required for a full mirror of all the updates.

yes they would - you'd still need the original rpm or you'd never be
able to recreate all the data, not to mention you'd still want the srpm
around.



-sv






More information about the fedora-devel-list mailing list