Request for Comments: updating RPMs using binary deltas.

Lamar Owen lowen at pari.edu
Fri Jan 9 01:08:56 UTC 2004


On Thursday 08 January 2004 04:19 pm, Alan Cox wrote:
> Problem 1:  Gzip files don't rsync well. bzip2 files I'm not sure of the
> situation. Rusty did work with rsyncable dpkg files and along with Tridge
> hacked up a gzip library that generated slightly larger rsyncable files.
> That change was tested ages ago in rpm and broke stuff so went back out.
> I don't know if anyone ever sat down and debugged it in full

The difference that I'm proposing is to generate the diff on the buildserver, 
not the update server.  The build (or even a for-task diff) server would have 
a repository of originals, and as each fresh update package is produced the 
diff to the unpacked original is generated.  The resulting diff is signed, 
and summed.  Then it's upload to the update server, prediffed and presigned.

> Problem 2:  Where do you get the original package from ? The CD has been
> one suggestion but JBJ pointed out that you can assemble an approximation
> of the original package from the on disk data in most cases. The config
> files might be a little different but most of the content is basically the
> same.

Then you run afoul of the problem Jef brought up.  We can't make the user 
downgrade to upgrade; thus, we must have the original RPM avilable (or 
struggle with an unmanageable plethora of permutations of packages). 

Available can mean the install media; it could mean a few GB of space on the 
user's HD (if they chose to install a local repository of the RPMs that they 
installed (which would have to be updated as new RPMs are installed, or be a 
full copy one)).  It could mean a download of the original off the update 
server if the user just simply cannot find the original RPM, in which case 
the advantage is negated.  They should learn to keep the CD or other local 
repository around anyway to be able to roll back errant updates.

> Problem 3:  Server resources. The rsync computation clobbers the server
> compared to the overhead of just spewing bits. Given people are running
> 3000 vsftp sessions in parallel off big servers that is a concern.

The diff would not be done real-time.  That would blow out all the advantages 
of doing the diffs in the first place.  The diff would be done at build time, 
not server time, for the updates.

The reason I mentioned rsync at all is because it can produce an incremental 
local backup of  changed files very easily, which then can be packaged and 
uploaded to the server.  I was not intending or proposing the use of rsync to 
be the wire protocol between the update server and each user.  Sorry if I 
mislead, there.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu





More information about the fedora-devel-list mailing list