Better repodata performance (was: redhat abe)

Axel Thimm Axel.Thimm at ATrpms.net
Sat Jan 29 21:22:06 UTC 2005


On Sat, Jan 29, 2005 at 02:23:25PM -0500, seth vidal wrote:
> On Sat, 2005-01-29 at 17:11 -0200, Alexandre Oliva wrote:
> > On Jan 29, 2005, Axel Thimm <Axel.Thimm at ATrpms.net> wrote:
> > > How about having multiple repodatas, the base one and small
> > > incremental ones, the incremental ones containing also package
> > > cancelations? As a side effect this would also reduce download
> > > bandwidth and thus make even clients/users happy (not only repo
> > > maintainers).
> > 
> > +1.  Heck, make it +2.
> > 
> > I agree with Axel, for a change :-)
> > 
> 
> How would it reduce bandwidth - you'd have to download and parse
> multiple entries and you'd STILL have to do just a much work on the
> repo-side b/c you'd have to check all the packages for changes.

For N packages the ballanced load are log_2 N bins. Adding M packages
touches only log_2 M bins. And the bins have a max size of 2^i
packages where i goes from 0 to N-1. And the good news is you touch
the bins with i < M, e.g. the small ones.

The statistical net effect is that for M package additions to
arbitrary N you get log_2 M downloads of a total of 2M packages.

In relevant numbers:

o N~=4000, log_2 N~=12
  You have 12 bins.
o 10 security/bug fix updates, (statistically) only bins 0 to 4 are
  changed amounting to 32 packages.
  Clients download only 5 files worth of 32 packages in size.

Compare with the current situation, where you need to get the whole
lot of N packages for each update.

For this to work you need to

o introduce package cancelation (anti-packages ;)
o introduce multiple repodata components
o keep a manifest of the last state and feed the repo creation system
  with the differences (packages lost, packages gained).

It's rather common and very efficient in high performance statistics
of large sums.
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20050129/624bc666/attachment.sig>


More information about the fedora-devel-list mailing list