[Pulp-dev] repository versions update

Brian Bouterse bbouters at redhat.com
Fri Dec 1 22:16:54 UTC 2017


Thank you all for such great discussion!

To recap some discussion we had today. We are going to look at the
versioned repos use cases at an upcoming MVP call in the near future
(probably 12/8). Look for the pulp-list announcement. If you have use cases
you want to share, you can add them in red in the Versioned Repos section
of the MVP here:  https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_
Viable_Product/#Versioned-Repositories

Once the use cases are known, we can look at the PR and see if it fulfills
them. From the discussion today, the general consensus is that gap will be
relatively small, which makes including it in Pulp3 feasible.

@misa providing those types of features may be possible. Imagine an
optional attribute on a repo version named 'frozen' that defaults to True.
While the latest repo_version for a repo has frozen=False, any action that
would normally create a new repo version (copy, add/remove, delete, etc)
would act on the existing repo version and *not* create a new one. Then the
user can update the frozen attribute of the repo version when they want,
which commits the transaction as a repo version. I don't think this would
be too hard to implement.


On Thu, Nov 30, 2017 at 3:20 PM, Michael Hrivnak <mhrivnak at redhat.com>
wrote:

>
>
> On Thu, Nov 30, 2017 at 11:43 AM, Mihai Ibanescu <mihai.ibanescu at gmail.com
> > wrote:
>
>> I am late to the thread, so I apologize if I repeat things that have been
>> discussed already.
>>
>> Is it a meaningful use case to publish an older version of the repo? Once
>> published, do you keep track of which version got published, and how do you
>> decide which version to push next? This seems like a complication to me.
>>
>>
> A publication will have a reference to the version that it was created
> from. To illustrate how that would get used: Your CTO calls early on a
> Saturday morning and says "I read in the news about a major security flaw
> in cowsay, and I know our applications depend heavily on it. What version
> do we have deployed right now???!!!" You can concretely determine which
> publications are being currently "distributed" to your infrastructure, and
> from there see their exact content sets by virtue of the repo version.
>
> Then there is the promotion workflow, which in Pulp 2 requires a lot of
> copying and re-publishing. With repo versions, you'll have a sequence of
> versions of course. Let's say there's 1, 2 and 3. Version 1 is deployed
> now, version 2 is undergoing testing, and version 3 got created last night
> by the weekly sync job you setup. You would have two different distributors
> that make these publications available to clients: one for production, and
> one for testing. "Promotion" becomes just the act of updating the reference
> on a distribution to a different publication. When testing on version 2 is
> done, assuming it passes, you can update the production distribution to
> make it use version 2.
>
> There are a few use cases for publishing an old version.
>
> One is: I want to publish the same exact content set two different ways,
> with two different publishers. If the contents change between publishes, I
> want a guarantee that it won't cause the second publish to use different
> content than the first.
>
> Second: I like the state of the content in a repo as it is right now. I
> want to publish that exact content set. If any changes happen to the
> content in that repo between now and when my publish task gets run by a
> worker, I don't want those changes to affect the publish I'm requesting
> right now.
>
> Third: I want the ability to roll back from a bad content set to a
> known-good one. How many publications must I keep around to have confidence
> that if I need to roll back some distance, that publication will still be
> available? It's valuable to know I can re-publish an older version any time
> I need it.
>
> Fourth: In some cases you may decide after-the-fact that you need to
> publish the same content set a different way. Maybe you went to kickstart
> from a yum repo and then remembered that (this is a true story) one version
> of your installer is too old to know about sha256 checksums, so you have to
> go re-publish the same content set with different settings for how the
> metadata gets generated.
>
> Otherwise, just as reproducible builds of software is a very valuable
> trait, reproducible publishes of repositories are valuable for similar
> reasons.
>
>
>
>> As a user / content developer, it seems more useful to me to always
>> publish the latest (i.e. don't have an optional version for publishing),
>> but have the ability to copy from a specific version of a repo into another
>> repo (or the same repo, effectively reverting the content of latest).
>>
>> So I would shift the discussion away from the REST API (for now), and
>> more into the expected behavior for manipulating content within pulp. The
>> operations I am aware of are: syncing units, importing units,
>> copying/deleting units, and I am seeking clarification on how versioning
>> will work for each.
>>
>> Syncing is probably the easiest, because it can handle all the changes
>> internally and create a new version at the end.
>>
>> For importing, if you don't want to create unnecessary intermediate
>> versions that are meaningless, I would want the ability to upload more than
>> one unit and associate it to the repo, and then create a version. In other
>> words, a transactional multi-upload.
>>
>
> Indeed. We want to have a behavior in Pulp 3 anyway that lets you
> arbitrarily add and remove multiple content units in one operation. That's
> one of the more notable missing features from Pulp 2. As Brian has pointed
> out, one option is to let a user directly POST to a "versions" endpoint and
> express what content they want to add/remove. Even without repo versions,
> we'd still want an API that lets you bulk add/remove.
>
>
>> For copying, as suggested above, I want to optionally specify the version.
>>
>> Deleting by itself is not hard, it does what it needs to do and then
>> creates a version.
>>
>> The more complicated use case would be: what if I wanted to change the
>> contents of repoA:
>> * add 3 packages from repo1 version 1
>> * add 4 packages from repo2 (latest)
>> * delete 5 packages
>>
>> and at the end have a single version change for repoA.
>>
>> Or, for the same repoA:
>> * delete all units of type "rpm" and name "glibc"
>> * copy unit type "rpm" and name "glibc" from two versions ago
>>
>>
>> If you wanted this use case, then you need a new resource type, somewhat
>> similar to a Task, let's call it Transaction. It is tied to the repository
>> it operates on (repoA in the example above), and locks it from further
>> changes until the transaction is committed or aborted. It could be
>> implemented internally as a repository. You start with the current contents
>> of repoA, and you perform whatever operations you need to do (including
>> changing repo metadata). When you "commit" the Transaction, it becomes
>> *the* new version of the repository and unlocks repoA.
>>
>
> Yep, we're on the same page with the use case I think. The other option is
> to let you as a user query for whatever content you care about adding and
> removing; find it however you see fit. Then use the bulk add/remove feature
> to carry that out in one operation.
>
> I do like the idea of persistently storing a Transaction as you call it,
> and possibly even letting a user build one explicitly. Even just as an
> implementation detail, any bulk add/remove endpoint may need to store the
> requested changes temporarily in the database as a means to get the input
> from the web handler to a celery worker. We probably don't want to stuff
> 10k+ content references into an AMQP message and pass them all in as an
> argument to the task. And if we're going to store them in the DB, maybe it
> would make sense to expose that to the user and let them create a
> Transaction directly.
>
>
>> Whether a Version is a full copy of the repo or a delta is an
>> implementation detail. I would argue for full copy, otherwise you run into
>> the inefficiencies of cvs which had to apply patches in reverse order just
>> to get to a version in the past. I would find it more useful to have a repo
>> diff resource (diff version 1 with version 3, or repo1 version 1 with repo2
>> latest).
>>
>
> Agreed that it's an implementation detail. In the case of cvs and similar,
> all changes had to be applied sequentially in order to construct a final
> product. When you're only tracking set membership, querying becomes MUCH
> simpler and is very efficient.
>
>
>>
>> Unfortunately, it is a rather large paradigm shift, and not one that you
>> can push in a 3.0 -> 3.1 transition. Parts of it will need to land in 3.0
>> proper, determining what can be left out is an exercise to the reader who
>> managed to keep up with my long emails.
>>
>> Hey, a man can dream.
>>
>
> I'm dreaming with you! (and also likely putting people to sleep with my
> own long emails) I also think this is a hallmark behavior that is important
> to get right conceptually, and very important to a variety of stakeholders.
>
> Thanks a lot for sharing your insight! If you have more thoughts on these
> use cases, please keep it coming.
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171201/1fa1aaaf/attachment.htm>


More information about the Pulp-dev mailing list