Better repodata performance (was: redhat abe)

Axel Thimm Axel.Thimm at ATrpms.net
Sat Jan 29 11:39:33 UTC 2005


On Sat, Jan 29, 2005 at 02:36:09AM -0500, seth vidal wrote:
> > The exercise is to attempt a method in which you save computation of md5 
> > or sha1, as these are one of the time consuming steps of createrepo.  
> > The save would be in a 100k package repository: (100,000 - N) * 
> > Time(sum_calc), where N equals the number of packages that *need* to 
> > generate sums for. A parameterized list of package names passed into 
> > createrepo would be sufficient to figure out what composes the N list.  
> > An external process, such as a Manifest list, would then be used to 
> > mitigate a set of packages through the entire build process.  Apt uses 
> > a md5sum cache, but having fine-tuned controlled of the process would 
> > be more stable and directed. This is how much saving you'd get for #2.
> 
> Let me know when you've figured it out but as it stands I don't think
> incrementally updating the metadata is very feasible.

How about having multiple repodatas, the base one and small
incremental ones, the incremental ones containing also package
cancelations? As a side effect this would also reduce download
bandwidth and thus make even clients/users happy (not only repo
maintainers).

The base repodata and the incremental ones would be merged from time
to time, best with a binary load algorithm as done in large sum
statistics (for 100K packages you would need only 17 files).
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20050129/8d3e4ea1/attachment.sig>


More information about the fedora-devel-list mailing list