[Pulp-dev] repository versions update
dkliban at redhat.com
Mon Dec 4 15:11:20 UTC 2017
I am looking forward to discussing the use cases. I hope we can get
versioned repositories into 3.0. Thanks everyone for the discussion so far.
On Fri, Dec 1, 2017 at 5:16 PM, Brian Bouterse <bbouters at redhat.com> wrote:
> Thank you all for such great discussion!
> To recap some discussion we had today. We are going to look at the
> versioned repos use cases at an upcoming MVP call in the near future
> (probably 12/8). Look for the pulp-list announcement. If you have use cases
> you want to share, you can add them in red in the Versioned Repos section
> of the MVP here: https://pulp.plan.io/projects/
> Once the use cases are known, we can look at the PR and see if it fulfills
> them. From the discussion today, the general consensus is that gap will be
> relatively small, which makes including it in Pulp3 feasible.
> @misa providing those types of features may be possible. Imagine an
> optional attribute on a repo version named 'frozen' that defaults to True.
> While the latest repo_version for a repo has frozen=False, any action that
> would normally create a new repo version (copy, add/remove, delete, etc)
> would act on the existing repo version and *not* create a new one. Then the
> user can update the frozen attribute of the repo version when they want,
> which commits the transaction as a repo version. I don't think this would
> be too hard to implement.
> On Thu, Nov 30, 2017 at 3:20 PM, Michael Hrivnak <mhrivnak at redhat.com>
>> On Thu, Nov 30, 2017 at 11:43 AM, Mihai Ibanescu <
>> mihai.ibanescu at gmail.com> wrote:
>>> I am late to the thread, so I apologize if I repeat things that have
>>> been discussed already.
>>> Is it a meaningful use case to publish an older version of the repo?
>>> Once published, do you keep track of which version got published, and how
>>> do you decide which version to push next? This seems like a complication to
>> A publication will have a reference to the version that it was created
>> from. To illustrate how that would get used: Your CTO calls early on a
>> Saturday morning and says "I read in the news about a major security flaw
>> in cowsay, and I know our applications depend heavily on it. What version
>> do we have deployed right now???!!!" You can concretely determine which
>> publications are being currently "distributed" to your infrastructure, and
>> from there see their exact content sets by virtue of the repo version.
>> Then there is the promotion workflow, which in Pulp 2 requires a lot of
>> copying and re-publishing. With repo versions, you'll have a sequence of
>> versions of course. Let's say there's 1, 2 and 3. Version 1 is deployed
>> now, version 2 is undergoing testing, and version 3 got created last night
>> by the weekly sync job you setup. You would have two different distributors
>> that make these publications available to clients: one for production, and
>> one for testing. "Promotion" becomes just the act of updating the reference
>> on a distribution to a different publication. When testing on version 2 is
>> done, assuming it passes, you can update the production distribution to
>> make it use version 2.
>> There are a few use cases for publishing an old version.
>> One is: I want to publish the same exact content set two different ways,
>> with two different publishers. If the contents change between publishes, I
>> want a guarantee that it won't cause the second publish to use different
>> content than the first.
>> Second: I like the state of the content in a repo as it is right now. I
>> want to publish that exact content set. If any changes happen to the
>> content in that repo between now and when my publish task gets run by a
>> worker, I don't want those changes to affect the publish I'm requesting
>> right now.
>> Third: I want the ability to roll back from a bad content set to a
>> known-good one. How many publications must I keep around to have confidence
>> that if I need to roll back some distance, that publication will still be
>> available? It's valuable to know I can re-publish an older version any time
>> I need it.
>> Fourth: In some cases you may decide after-the-fact that you need to
>> publish the same content set a different way. Maybe you went to kickstart
>> from a yum repo and then remembered that (this is a true story) one version
>> of your installer is too old to know about sha256 checksums, so you have to
>> go re-publish the same content set with different settings for how the
>> metadata gets generated.
>> Otherwise, just as reproducible builds of software is a very valuable
>> trait, reproducible publishes of repositories are valuable for similar
>>> As a user / content developer, it seems more useful to me to always
>>> publish the latest (i.e. don't have an optional version for publishing),
>>> but have the ability to copy from a specific version of a repo into another
>>> repo (or the same repo, effectively reverting the content of latest).
>>> So I would shift the discussion away from the REST API (for now), and
>>> more into the expected behavior for manipulating content within pulp. The
>>> operations I am aware of are: syncing units, importing units,
>>> copying/deleting units, and I am seeking clarification on how versioning
>>> will work for each.
>>> Syncing is probably the easiest, because it can handle all the changes
>>> internally and create a new version at the end.
>>> For importing, if you don't want to create unnecessary intermediate
>>> versions that are meaningless, I would want the ability to upload more than
>>> one unit and associate it to the repo, and then create a version. In other
>>> words, a transactional multi-upload.
>> Indeed. We want to have a behavior in Pulp 3 anyway that lets you
>> arbitrarily add and remove multiple content units in one operation. That's
>> one of the more notable missing features from Pulp 2. As Brian has pointed
>> out, one option is to let a user directly POST to a "versions" endpoint and
>> express what content they want to add/remove. Even without repo versions,
>> we'd still want an API that lets you bulk add/remove.
>>> For copying, as suggested above, I want to optionally specify the
>>> Deleting by itself is not hard, it does what it needs to do and then
>>> creates a version.
>>> The more complicated use case would be: what if I wanted to change the
>>> contents of repoA:
>>> * add 3 packages from repo1 version 1
>>> * add 4 packages from repo2 (latest)
>>> * delete 5 packages
>>> and at the end have a single version change for repoA.
>>> Or, for the same repoA:
>>> * delete all units of type "rpm" and name "glibc"
>>> * copy unit type "rpm" and name "glibc" from two versions ago
>>> If you wanted this use case, then you need a new resource type, somewhat
>>> similar to a Task, let's call it Transaction. It is tied to the repository
>>> it operates on (repoA in the example above), and locks it from further
>>> changes until the transaction is committed or aborted. It could be
>>> implemented internally as a repository. You start with the current contents
>>> of repoA, and you perform whatever operations you need to do (including
>>> changing repo metadata). When you "commit" the Transaction, it becomes
>>> *the* new version of the repository and unlocks repoA.
>> Yep, we're on the same page with the use case I think. The other option
>> is to let you as a user query for whatever content you care about adding
>> and removing; find it however you see fit. Then use the bulk add/remove
>> feature to carry that out in one operation.
>> I do like the idea of persistently storing a Transaction as you call it,
>> and possibly even letting a user build one explicitly. Even just as an
>> implementation detail, any bulk add/remove endpoint may need to store the
>> requested changes temporarily in the database as a means to get the input
>> from the web handler to a celery worker. We probably don't want to stuff
>> 10k+ content references into an AMQP message and pass them all in as an
>> argument to the task. And if we're going to store them in the DB, maybe it
>> would make sense to expose that to the user and let them create a
>> Transaction directly.
>>> Whether a Version is a full copy of the repo or a delta is an
>>> implementation detail. I would argue for full copy, otherwise you run into
>>> the inefficiencies of cvs which had to apply patches in reverse order just
>>> to get to a version in the past. I would find it more useful to have a repo
>>> diff resource (diff version 1 with version 3, or repo1 version 1 with repo2
>> Agreed that it's an implementation detail. In the case of cvs and
>> similar, all changes had to be applied sequentially in order to construct a
>> final product. When you're only tracking set membership, querying becomes
>> MUCH simpler and is very efficient.
>>> Unfortunately, it is a rather large paradigm shift, and not one that you
>>> can push in a 3.0 -> 3.1 transition. Parts of it will need to land in 3.0
>>> proper, determining what can be left out is an exercise to the reader who
>>> managed to keep up with my long emails.
>>> Hey, a man can dream.
>> I'm dreaming with you! (and also likely putting people to sleep with my
>> own long emails) I also think this is a hallmark behavior that is important
>> to get right conceptually, and very important to a variety of stakeholders.
>> Thanks a lot for sharing your insight! If you have more thoughts on these
>> use cases, please keep it coming.
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
> Pulp-dev mailing list
> Pulp-dev at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-dev