Request for Comments: updating RPMs using binary deltas.

Chris Adams cmadams at hiwaay.net
Thu Jan 8 21:37:56 UTC 2004


Once upon a time, Alan Cox <alan at redhat.com> said:
> Problem 3:  Server resources. The rsync computation clobbers the server
> compared to the overhead of just spewing bits. Given people are running
> 3000 vsftp sessions in parallel off big servers that is a concern.
> 
> Problem 3 seems to be the big unsolved today. In theory the other bits
> are solved.

I think you'd be much better off not trying to diff and then patch the
binary package (especially on the fly).  A better way would be something
that compares the exploded package and just ships the files that are
different (or maybe even diffs of the raw files), installs the changed
files, and updates the RPM database to say the whole package is updated.

So, if /usr/include/stdio.h is the only file that changed in the
glibc-devel package, you'd ship a new copy of stdio.h (or even just a
diff of the old and new stdio.h).  This is the way for example the Tru64
patch system works; they ship a .tar.gz containing the changed files
(along with scripts to apply changes, merge local changes, etc.).

With this, you have to have some type of accumulating patches.  If
pkg-1.0-1 was in the distribution, and updates pkg-1.1-1 and pkg-1.1-2
have been released, you either keep the 1.0-1=>1.1-1 patch around
forever, or the 1.1-2 patch includes patches against both 1.0-1 and
1.1-1.  The second way would not be too hard with a smart tool (i.e.
take the 1.0-1=>1.1-1 and 1.1-1=>1.1-2 patches and combine them to patch
either 1.0-1 or 1.1-1 to 1.1-2), so the master site wouldn't have to
keep every incremental release around just for patch updates, and people
could still update if they missed a patch.

The hard part of trying to ship just a diff of a file is if the original
file has changed (and no copy was saved in a "known" place), you're
stuck.  Tru64 has ".proto" versions of all config files that are
expected to be edited (even if they are only typically edited by a
program); the real file if copied from the .proto at install time and
then patches are distributed as diffs that get merged (and you can get
.rej type files to manually merge if your changes are too weird).  That
requires changing a whole lot of packages right now (although I suppose
the RPM library could be modified so that on install any %config file
gets copied away to a "safe" place and do this behind the scenes).  It
would be a lot easier just to only ship whole files, and use the normal
"if it has changed and is not %config do .rpmnew" rules.

I think this is very doable, and it doesn't require any extra resources
on the part of distribution sites or mirrors (except some more disk of
course).  It requires a program to compare two RPMs and build the patch
"RPM" (PRPM?), and a tool to install the patch RPM (including running
%pre/%post scripts and updating the RPM database).  The patch RPM should
include all meta information such as timestamps and the install tool
should update all of that (even when the file contents didn't change),
so that the net result is identical to installing the updated RPM.

-- 
Chris Adams <cmadams at hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.





More information about the fedora-devel-list mailing list