Better repodata performance

Alexandre Oliva aoliva at redhat.com
Mon Jan 31 03:08:23 UTC 2005


[added rpm-metadata list]

On Jan 30, 2005, Jeff Johnson <n3npq at nc.rr.com> wrote:

> Look, repos currently change daily, perhaps twice daily. Trying to
> optimize incremenatl updates for something that changes perhaps
> twice a day is fluff.

Generating an xdelta from the previous versions of the .xml.gz files
to the current versions, along with the relative location of the
alternate repomd.xml that described them, modified to indicate they're
deltas between the two given timestamps doesn't sound like such a
difficult or wasteful thing to do.  We'd grow repomd.xml by a few
bytes:

  <data type="delta-chain">
    <location href="repodata/$P-repodata.xml"/>
    <checksum type="sha">...</checksum>
  </data>


where $P stands for a prefix used to denote the previous generation of
the repository.  It could be a number, a timestamp, whatever, it
doesn't matter.  $P-repodata.xml could contain data such as:

<?xml version="1.0" encoding="UTF-8"?>
<repomd xmlns="http://linux.duke.edu/metadata/repo">
  <data type="delta-other">
    <location href="repodata/$P-other.xdelta"/>
    <checksum type="sha">...</checksum>
    <timestamp>...</timestamp>
    <open-checksum type="sha">...</open-checksum>
  </data>
  <data type="delta-filelists">
    <location href="repodata/$P-filelists.xdelta"/>
    <checksum type="sha">...</checksum>
    <timestamp>...</timestamp>
    <open-checksum type="sha">...</open-checksum>
  </data>
  <data type="delta-primary">
    <location href="repodata/$P-primary.xdelta"/>
    <checksum type="sha">...</checksum>
    <timestamp>...</timestamp>
    <open-checksum type="sha">...</open-checksum>
  </data>
  <data type="delta-chain">
    <location href="repodata/$PP-repodata.xml"/>
    <checksum type="sha">...</checksum>
  </data>
</repomd>

The timestamps would be the same as those in the original repomd.xml
file, and the checksums would be such that one could verify that (i)
the delta file was downloaded correctly, and that (ii) the expanded
original .xml.gz file that they give xdelta to have the delta applied
to obtain the newer version of the .xml file matches what the delta
expects (although IIRC xdelta already performs this check itself).

In this file, $PP would be the prefix for whatever previous version
was available in the previous version of the repository, forming a
linked list.

So anyone can walk the list until they find a delta whose timestamp
matches that of the version they have, and then start downloading the
xdeltas and applying them until they reach the current version.


This extension would be fully backward-compatible, since you're free
to not follow the delta-chain if you like.  And, if you do, it might
turn out that you don't find a timestamp that matches the files you
have, which would be unfortunate, but then, you'll only have
downloaded a bunch of small .xml files, no big deal.

-- 
Alexandre Oliva             http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}




More information about the fedora-devel-list mailing list