[Pulp-dev] versioned repositories

Dennis Kliban dkliban at redhat.com
Wed May 24 15:26:46 UTC 2017

I noticed that the REST API examples don't mention anything about deleting
a particular version of a repository. This is a use case that we need to


On Wed, May 17, 2017 at 10:03 PM, Michael Hrivnak <mhrivnak at redhat.com>

> We've discussed versioned repositories and their merits in the past, but
> I'd like to propose a specific direction, and inclusion in 3.0. As a recap
> of goals, versions can help us answer two important questions about the
> history of a repository:
> 1) What set of content is in a specific version of a repository?
> 2) What changed between two arbitrary versions of a repository?
> I am proposing a model where Pulp creates a new version of a repository
> for every operation that changes that repo's content. For example, a sync
> task would create a single new version.
> Basic Example
> -----------
> - You create repository "foo".
> - You sync repository "foo", which produces version 1 of that repo.
> - You sync once per day for some period of time, automatically creating a
> new version each time.
> - You publish repo "foo", which defaults to publishing the most recent
> version.
> - You don't like something that's new in the repo, so you roll back by
> publishing a previous version.
> Data Model Basics
> -----------
> In the past we've stored the relationship between a content unit and a
> repo as a standard many-to-many through table. There's a reference to a
> unit, and a reference to a repo.
> The version scheme I'm pitching adds two new fields to that through table:
> vadded - a foreign key to the repo version in which this content unit was
> added
> vremoved - a foreign key to the repo version in which this content unit
> was removed. This can be null.
> Multiple entries can exist for the same content unit and repo, so long as
> a new one is not added until the previous one's "vremoved" field is set.
> With this structure, it is easy to query the database to answer both
> questions we started with.
> ----------
> Some endpoint will be made that gives access to the versions of a specific
> repository. Ideally we would have a nested endpoint like this:
> /api/v3/repositories/foo/versions/
> But nested views have been a problem for us with DRF (django rest
> framework). If we aren't able to make that happen, I've gotten this to work
> in my PoC branch:
> /api/v3/repositoryversions/?repository=foo
> It's not yet clear how best to represent content through the REST API. A
> nested endpoint within the repo version object would be ideal.
> /api/v3/repositories/foo/versions/4/content/
> Operations on a repo where a version could be chosen, such as a publish,
> should default to the latest version. It's an open question how best to
> represent that, and perhaps it takes the form of two endpoints:
> default to latest: POST /api/v3/repositories/foo/distributors/bar/publish
> specify a version: POST /api/v3/repositories/foo/versions/4/publish
> But that's just one idea. Much about our REST API layout has yet to be
> written in stone, and we have flexibility.
> Orphans
> ---------
> Notice that this changes the orphan workflow. Removing a content unit from
> a repo doesn't make it an orphan. This helps reduce the need to run an
> orphan cleanup task, which in turn helps avoid the inherent race condition
> that task can introduce.
> Trim History
> ---------
> But you may not want to keep history forever, so a valuable feature will
> be the ability to trim history. I think this would just be an operation
> that squashes a bunch of versions together, and it could optionally take
> that opportunity to immediately delete a content unit that becomes an
> orphan.
> Illustrating the workflow, if you wanted to squash history prior to
> version 10, the task would:
> - delete all of a repo's relationships in the through table where vremoved
> is a version <= 10
> - optionally check if each content unit is now an orphan and remove if so
> - update all remaining entries where vadded < 10 by setting vadded to 10
> PoC
> --------
> I have a branch with proof-of-concept code here:
> https://github.com/pulp/pulp/compare/3.0-dev...mhrivnak:vers
> ioned-repos?expand=1
> The models are the most interesting place to look. In particular, I'm very
> pleased with how simple the "content()" method is, which returns a QuerySet
> matching all the content in a given version.
> The rest is REST ;) API stuff mostly, which isn't all that interesting
> except to demonstrate how the data could potentially be exposed. You can
> run the included tests (which I made just for dev purposes- not sure if
> they deserve a long-term home) which are found in the root of the git repo,
> and that loads some data into the database. Then you can hit this endpoint
> as an example:
> http://yourhost:8000/api/v3/repositoryversions/?repository=r1
> Obviously this code is rough, so please consider it for directional and
> conceptual purposes only. Assume major additions and improvements if we
> follow through on this concept.
> Value
> -------
> Tracking history in this way opens up great possibilities. Some examples:
> Promotion could become a matter of having two publishers on a repo with
> different settings, one for "testing" and one for "production", and just
> publishing whichever version you like with each. Multiple repos and copy
> operations are no longer needed for promotion. Austin suggested that the
> ability to tag versions with arbitrary key:value pairs could enhance this
> use case.
> An added concept, which could come post-3.0, is tracking publications more
> explicitly and associating each with a version. Although I could see a case
> for laying this groundwork now before the API is locked down. Promotion
> could become more about making a publication available in a different
> location, rather than re-creating it. We'd also know which content is part
> of a publication, and guarantee that content doesn't get removed before the
> publication does. This is a deficiency we have in Pulp 2.
> Pulp-to-pulp sync could become very efficient since they could easily
> replicate only the changes since the last sync.
> Incremental exports become more concrete. Rather than depending on a
> timestamp, you can know with certainty which version you have in the remote
> location, and thus which newer versions need to be exported.
> We could add a "finalized" boolean or similar to a version, and use that
> to know if it was successfully completed. If not, for example if a sync
> task stopped abruptly, the incomplete version could easily be recognized
> and removed.
> Feedback Please
> ----------
> Please ask questions, provide feedback, add ideas, suggest alternatives,
> etc. I'm perfectly happy even throwing this PoC away if we come up with
> something better.
> Thanks!
> --
> Michael Hrivnak
> Principal Software Engineer, RHCE
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170524/f6e66f95/attachment.htm>

More information about the Pulp-dev mailing list