Supporting EPEL Builds in Koji

Thu Jul 10 19:49:00 UTC 2008

On Thu, 2008-07-10 at 19:12 +0200, Jeroen van Meeuwen wrote:
> Mike Bonnet wrote:
> > Hi.  I've written up a proposal for a way to support EPEL builds in
> > Koji.  It's not the only way we could do this, but I think it's doable
> > with a reasonable amount of effort, and has the side-effect of greatly
> > simplifying the Koji setup process for a lot of people (by removing the
> > need to bootstrap/import an entire distro of packages into your private
> > Koji instance).  You can view the proposal here:
> > 
> > http://fedoraproject.org/wiki/Koji/EPELSupport
> > 
> > It's fairly detailed regarding the data model changes necessary, so if
> > you're not familiar with the Koji codebase you can skip those parts.
> > Questions and comments welcome.
> > 
> 
> Hi Mike,
> 
> good to see you've spend some time on this whereas I have been lazy in 
> Littleton (holiday).
> 
> I'd like to share a few thoughts on the Wiki page -which is a great start;
> 
>  From the Wiki page: "There is a strong feeling that if a package exists 
> in the Koji-managed local repo (whose contents the Koji admin has full 
> control over) it should always be preferred over the external repo 
> (whose contents the Koji admin may have little or no control over)."
> 
> The preference koji will have (in using which package in the buildroot), 
> might introduce the problem where customly built package foo-1.0 is used 
> in the buildroot, and upstream updates to foo-1.1 - the running nodes 
> would update to foo-1.1 whereas the buildroot still uses the custom 
> foo-1.0...

Yes, it's up to the Koji admin to monitor the remote repo, and take
appropriate action when their custom local packages are superseded by
packages in the remote repo.  That may be untagging or blocking the
package locally so the newer version can be pulled down from the remote
repo.  Or it may be rebuilding the custom package based on the updated
sources.  The point is that the build environment doesn't change unless
the Koji admin takes some action to change it.

> The point being, that these updates have to managed as they are 
> released. The updates need to managed on the side where said packages 
> are being mashed into a repository (infra side) or applied (client side).
> 
> You can see the duplicate effort when the updates are managed on either 
> side (infra or client), _and_ in koji, separately.

There is duplicate effort either way.  The difference is that, if
highest-nvr-wins is used, and a remote repo updates to a later version
of a package that you have a custom build of, there is *no way* for you
to revert your build environment to that lower-nvr version without
bumping your version higher than their version (without actually
changing the source at all) and rebuilding.  It encourages this Cold War
arms-race of version numbers between your custom packages and the remote
repo's packages, and results in the admin having to fake higher version
numbers and rebuild constantly *without any source changes* just to keep
their custom packages in their build environment.

Alternately, if first-match-wins is used (where the first repo is the
locally-managed Koji repo), and a remote repo updates to a later version
of a package you have a custom version of, nothing happens to your build
environment.  If you decide you want the newer version from the remote
repo, you untag your local package and let it get pulled in from the
remote repo.  If that newer version has problems, retag your custom
version and it will then be available in the build environment again.
There is no unnecessary building of packages, no faking version numbers,
and no unexpected changes to your build environment.  It's the
"principle of least surprise", which is why I think it's the right
policy to use in a managed build environment like Koji.

> I would like to suggest the koji development team makes the priority 
> setting koji is going to use a configurable item -which in compared to 
> the bigger picture isn't all that much a priority, just something to 
> think about.

I strongly feel that this isn't something that needs to be configurable,
and that first-match-wins is the correct behavior.  But if other people
agree that there is a valid use-case for making it configurable, and
Seth and/or James can make the logic in repomerge configurable, then we
can add switch for it to Koji.

> Additionally, I'd like to comment on / ask about the proposed database 
> changes for the tag_config table; In an attempt to show you what I was 
> thinking, here's a number of questions;
> 
>  From the Wiki page: "At repo creation time, the repodata will be 
> retrieved from the processed url and merged with the local repodata as 
> described above. This single repo will then be used for subsequent 
> builds against the tag"
> 
> Do I understand correctly one can only give one single repository URL to 
> a certain tag? Does this mean that a tag is created for (example) 
> "dist-el5" with a remote repository URL, and then "dist-el5-updates" 
> with another remote repository URL? This means for the build target used 
> to have dist-el5-updates inherit dist-el5, right? Which then implies 
> either metadata needs to be imported for dist-el5-updates or inheritance 
> can only be applied during build-time... right?
> 
> The question I guess is basically; how does koji handle tags with a 
> combination of remote urls & inheritance?

Originally you were correct, the proposal only allowed for a single
remote repo to be configured.  This was mandated by the desire to track
packages back to their repository of origin, and the lack of repository
data in the rpmdb.  jkeating convinced me that this wasn't a very useful
implementation, and suggested that we could get information about the
origin of a given rpm from the baseurl in the repodata.

I've updated the wiki page with a new implementation proposal that will
allow for multiple remote repos while still tracking package origin, and
specifies how remote repos will interact with the tag inheritance tree.
Please take a look and let me know what you think.

>  From the Wiki page: "Right now that (rpminfo) table enforces uniqueness 
> of (name, version, release, arch)."
> 
> I see that koji does not store complete package nevra which may become a 
> problem in case duplicate nvra occur (which is very much likely the case 
> where rebuilding packages with the release number bumped might collide 
> with upstream doing a release bump -which is where the epoch is often 
> used as upstream has clear guidelines for epoch bumps which -hopefully- 
> make them occur in special circumstances only and thus very much reduces 
> the chance of a colliding nevra). I like the proposed uniqueness of 
> NVRA-namespaces as well, don't get me wrong ;-)

Koji intentionally ignores epoch when enforcing uniqueness.  For better
or worse, the epoch is mostly hidden from users, and does not show up in
the filename.  Having packages with the same NVRA but different epochs
was considered harmful when Koji was being designed, and it will prevent
this from happening.  Note that Koji does *store* the epoch, it just
doesn't use it when enforcing uniqueness.

In the proposal, local packages exist in one NVRA namespace, and each
remote repo (differentiated by URL) exists in a different NVRA
namespace.  So NVRA much be unique within each repo (local or remote)
but not across repos.  So NVRA collisions between your local Koji
instance and a remote repo will not cause problems at the data model
level.  Which package gets selected and made available in the buildroots
will be handled by the (possibly configurable) package selection policy
of createrepo/mergerepo.

> The other thing (and probably the last thing for now) I'd like to share 
> is that, for reproducibility purposes, how viable would it be to have 
> koji automatically import the remote RPM (the file and all the data) as 
> it is used from the remote repository? This may or may not be a 
> configurable option, saves work for admins compared to the situation 
> now, and preserves reproducibility under all circumstances, adding the 
> automatically imported RPM to the appropriate tags, storing them for 
> reproducibility whereas upstream only keeps two versions in the 
> repository... Though I understand it 1) consumes space and 2) isn't 
> helpful for the EPEL case, I think this is particularly useful for 
> long-term supported appliance software. Just wondering here ;-)

This sounds much more like the secondary-arch approach, and is separate
from what we're trying to accomplish here.  I had requested that the
secondary-arch daemon support a "same-arch-downstream" mode where it
would download and import (rather than rebuild) builds from an upstream
Koji as they were completed.  However, this is a lot more complicated
and requires more detailed policy.  If this is a requirement for you, I
suggest you take a look at the secondary-arch work.