Request for Comments: updating RPMs using binary deltas.

Bill Rugolsky Jr. brugolsky at telemetry-investments.com
Thu Jan 8 22:07:44 UTC 2004


On Thu, Jan 08, 2004 at 04:44:00PM -0500, Colin Walters wrote:
> It seems to me that the solution to this is to precompute the block
> checksums on the server.  We'd have an command like 
> "rsync --gen-checksums FILE1" which would create a FILE1.rsync-checksums
> file that the server could read.  If you wanted to do it recursively for
> a whole tree, just rsync -r --gen-checksums /path/to/dir.
> 
> Given a beefy enough server that can keep all the relevant files in the
> dentry cache, I would think that would take care of most of the
> overhead...

In the rsync algorithm, the recipient sends checksums of what it has to
the sender, and the sender calculate the rolling checksum of the file,
comparing against the table of checksums from the client.

Obviously, this does not scale.

Reverse rsync does as you propose: the sender makes block checksums
available to the client, which then roots around to see whether it has
those segments of the file.  The client then requests only the missing
pieces.  Hence the whole thing can be done with nothing more than
pipelined http range requests.

It has the additional benefit that the client can choose how much or
little of his filesystem to examine for matches.

Taking this to the next level, one can envision a filesystem like
Plan 9's Venti that addresses file blocks by their hash, and missing
blocks can just be pulled from a remote server.

Make the hash distributed, and there is no need for big servers. :-)

Regards,

	Bill Rugolsky





More information about the fedora-devel-list mailing list