yum differential updates

Rudolf Kastl che666 at gmail.com
Mon Apr 10 22:49:19 UTC 2006


2006/4/10, Jon Burgess <jburgess at uklinux.net>:
> On Mon, 2006-04-10 at 13:35 -0400, Jesse Keating wrote:
> > On Mon, 2006-04-10 at 19:19 +0200, Rudolf Kastl wrote:
> > >
> > > well the reason for beeing able to mirror the update repos with a
> > > permanently updated torrent would be simply that people are able to
> > > share the bandwidth they have open. rsync causes lots of server load
> > > afaik. mirroring is useful for a variety of reasons.
> >
> > But does torrent offer the ability that rsync does, to only grab the
> > differences?  If you're re-torrenting the whole thing every day that
> > seems less than optimal.
>
> It only grabs the differences, but my understanding is that every
> operation is done in units of one "piece". The pieces are all of a fixed
> size which is set when the torrent file is created, e.g. 256kB in
> bordeaux-DVD-i386.torrent.
>
> I think it would work as follows:-
>
> 1) RH create a torrent with all current updates and publish on tracker
> and start seeding.
>
> 2) A user starts off with no updates, downloads torrent. Downloads all
> the files updates from the seed and other users (other users that have
> been doing the same thing).
>
> 3) Some time later, RH publish a new torrent which has a mixture of some
> of the old files, with some added and some removed.
>
> 4) User downloads new torrent. The user adds this to his torrent
> program, making sure to select the same location as the previous
> download (this is key).
>
> 5) The torrent software will go through every file listed in the new
> torrent, some of which will be found and some will not.
>
> 6) Every "piece" in the new tracker will be part of one or more files.
> If the user has all the bytes contained in the piece then the software
> will checksum them to ensure they are correct and then note that this
> piece is already downloaded.
>
> 7) Pieces which have missing data, e.g. the piece contains data from a
> file which the user doesn't have, then the software will ignore the
> current contents of the piece and put it in the list of pieces which
> need to be downloaded.
>
> 8) The software proceeds to exchange pieces with other users and the
> seeds to collect all pieces of the torrent. As each is received it
> verifies the checksum and writes the contents out to the appropriate
> files.
>
> A long list of observations and thoughts:-
>
> - The user must keep downloading to the same location to gain the
> benefit (/var/cache/yum/update/packages might be good).
>
> - The downloads are not as efficient as a delta-RPM since the torrent
> will still need to download the complete contents of any new RPM. It
> does however, reduce the load on the mirror system.
>
> - The torrent will only exchange data with users running exactly the
> same torrent file, so if you are the first one to download a new RH
> torrent then there will be no-one else to get data from (except the
> initial seed).
>
> - Due to the problem above, it probably makes sense to only update the
> torrent infrequently (maybe once per week). The user should probably
> rely on using the normal yum mechanisms to download the very latest
> updates. Provided these get done to the same location as the torrent
> download and are cached then they won't be downloaded again once the
> torrent is updated (the user will immeadiately act as a seed for these
> once he gets the updated torrent).
>
> - Nothing will automatically remove old files the users download
> location. "yum clean packages" would remove the downloaded files, but
> the torrent would then have to download all the current updates again.
>
> - The user may need to make available several GB of storage to hold all
> the updates even though he might never install some of these RPMs on his
> system.
>
> - It would probably make sense to create separate torrents for the
> normal and debug RPMs. I guess there should be 2 torrents per
> ARCH/Release pair, plus maybe a SRPM torrent.
>
> - Some users will be unable to use the torrent since they are behind
> corporate firewalls which block it, it isn't a replacement for yum.
>
> - The new "LAN peer mode" in Azureus may enable clients to exchange
> pieces on a local network at high speeds minimising the need to download
> from the Internet. This would be a useful addition to the current yum
> behaviour.
>
> - Yum may need to be a little smarter about making certain that RPMs
> have the right checksum before using them. I know RPM does verify
> checksums, but I don't think yum does right now. The torrent will
> typically create all the new RPMS with 0 length and then use sparse
> writes to reconstitute the file once piece at a time whenever it
> receives some data. If the torrent download is ceased then many files
> will not contain the complete data (even though the length may be
> correct). Yum might like delete the file and re-download it.
>
> - There might be scope for a specialised torrent client to automate some
> of the behaviour above, e.g. pro-actively downloading new torrents,
> perhaps only downloading "pieces" which are contain data relevant to
> updates of RPMs currently installed (a client doesn't have to download
> and store the complete torrent). Deleting files which are no longer
> present in the latest torrent.
>
>         Jon
>
>
> --
> fedora-test-list mailing list
> fedora-test-list at redhat.com
> To unsubscribe:
> https://www.redhat.com/mailman/listinfo/fedora-test-list
>

a few thoughts on how one could seriously implement this:

a small commandline utility (python?) that fetches the latest torrent
from a location and checks if its signed with the correct fedora key
and after that look if an update seed service is running. if its
running restart it. if its dead start it if configured so.

it (the commandline util to fetch the torrent)  is invoked by a
crontabbed task (implementation similar to nightly yum update)

after that it would start downloading the torrent end seed it.

regards,
rudolf kastl




More information about the fedora-test-list mailing list