Jigdo - A Professional Letter to Mike McGrath

Jonathan Steffan jonathansteffan at gmail.com
Thu Dec 6 21:28:13 UTC 2007

Hash: SHA1

Ah, what a wonderful scholastic assignment. It's also great that Mike
is going to have to open the letter once it actually makes to him.

Dear Mr. McGrath,

As long time Fedora developers, we both have watched and lived through
the growing pains of Fedora releases. With the onset of the merging
between core and extras, we have successfully created the needed
infrastructure on which even basic Linux users can create their own
Fedora-based derivative distribution. Along with this achievement comes
the task of providing hosting infrastructure for an average
user/developer to be able to share his/her creation with the world. An
additional benefit in being directly involved in the sharing of the
"Spins" is the ability to data mine what packages people are including
in custom spins as to give our developer base more focus. To be able to
fulfill this task, please consider requesting additional resources to
archive signed updates so that they can be publicly accessed for the
duration of a release life-cycle.

The long road of Fedora development has been bumpy and even not
enjoyable at times. Throughout the duration of the Fedora 7 release, we
have made some very amazing changes that helped to open Fedora up to
the masses. One of the most community-based features is the ability to
Re-Mix and Re-Spin the distribution as the end-user sees fit. Being the
server team lead for Fedora Unity, I've had to come up with unique ways
to share the Re-Spins we compose. After trying many approaches with
different technologies, we have settled on using Jigdo, the jigsaw
downloader. Debian has used Jigdo for some time now and after poking
their system for a while I've seen they keep every update they push. I
do understand that Debian is considered a stable distribution much in
the way RHEL is considered stable. This makes the updates tree not as
active and it is easier to find mirrors to carry the full data set for
an extended period of time. We, however, need to find a way to offer
this same extended life for signed packages that have hit the Fedora
updates tree.

The amount of storage and bandwidth able to be saved can be illustrated
by a simple comparison between the efficiency of chopping up a 3.4GB
iso9660 file system arbitrarily (by a static chunk size) and the same
file system based on contents (file by file.) For a BitTorrent,
Fedora's current choice for sharing Spins, the hosted data is only
valid for a given chunk on a single ISO. This data's footprint (equal
to the combined chunk sizes of the entire torrent) can be used for
nothing but this Spin. To be able to host 5 Spins composed from similar
trees via BitTorrent, we now have a footprint of 17GB, not to mention
"seeders" have to run BitTorrent software to be able to contribute to
the swarm. Alternatively, Jigdo can be used to reduce the footprint of
these 5 Spins to about 4GB. The amount of additional data needing to be
hosted for each Spin, in addition to what data is already pushed to the
mirrors, is about 150MB per ISO with anaconda and about 200KB for ISOs
without the installer bits. To help illustrate the efficiency of using
Jigdo vs BitTorrent, the footprint for 250 Spins is 850GB for
BitTorrent and about 40GB for Jigdo. Additionally, a reduction in
overhead can be achieved by removing the need for the BitTorrent
tracker and all related network traffic without requiring any
additional work on the part of mirror administrators.

The current updates system is getting better each release, but I think
we should adjust our policies to also have an “updates-archive”
repository. This repository will include all signed updates that had
once lived in the updates repo, for the duration of the releases
life-cycle. I don't expect all mirrors would want to carry this extra
data so making the new repo optional will be a must. With the new
MirrorManager, we will be able to effectively point users at mirrors
that have been willing to take on the extra footprint. These requests
to MirrorManager could be used to compose reports on what packages the
community is utilizing most, allowing us to better focus our efforts.
By providing an unified point of entry for data, we will be able to
also log requests for package data found on in-house and non-official
spins. By utilizing the abilities of MirrorManager to return specific
mirrors for pre-defined IP blocks, we will enable end-users and
companies to download from on-site mirrors while maintaining complete
transparency. I hope that with advancements in Jigdo client software we
will be able to look at using Jigdo to host our official images. Please
let me know your thoughts on this matter. 

As always, thanks, 

- -- 
Jonathan Steffan
GPG Fingerprint: 93A2 3E2F DC26 5570 3472 5B16 AD12 6CE7 0D86 AF59

- -- 
Jonathan Steffan
GPG Fingerprint: 93A2 3E2F DC26 5570 3472 5B16 AD12 6CE7 0D86 AF59
Version: GnuPG v1.4.7 (GNU/Linux)


More information about the Fedora-infrastructure-list mailing list