[Pulp-dev] Content types which are not compatible with the normal pulp workflow

Mon May 28 12:01:33 UTC 2018

On Sat, May 26, 2018 at 2:23 AM, Daniel Alley <dalley at redhat.com> wrote:
> @Brian
>
> I agree with a lot of those points, but I would say that we're not just
> competing against hodgepodge collections of "scripts", but also against
> writing small microservice-y Flask apps that only implement the API for one
> content type.
>
> Also, rollback is not something Pulp would necessarily be able to offer with
> respect to history-sensitive content and metadata, like git repositories, or
> the Cargo example I provided.  It's still something the plugin writer would
> have to implement themselves in this case.
>
> @Jeff
>
>> perhaps a new component of a Publication like PublishedDirectory that
>> references an OSTree/Git repository created in /var/lib/pulp/published.
>
>
> I like the idea generally, but I don't think it would be able to be a
> component of a Publication.  I think it would need to be an alternative to a
> Publication which fulfills a similar function.
>
> The fundamental problem is this scenario:
>
> You upload a git repository with a git repository plugin
> You publish and distribute version 1 of the git repository
> You publish and distribute version 2 of the git repository
> A client downloads the git repository
> You notice a problem and decide to roll back to version 1.  A publication of
> version 1 already exists, which you distribute.
> Clients have a broken git history.  New clients can download the old version
> but anyone who has already downloaded version 2 will not be able to roll
> back to version 1 by pulling from Pulp

Just trying to understand the situation:
Is that because of the rollback actually creates version #3 that's
"newer" but lacks the rolled-back commits?
So there are some "merge" conflict if folks, that cloned #2, want to
pull from version #3 but their branch contains a commit the origin
lacks now?
Or rather that the published bits of the version #2 doesn't exist
anymore at all?

>
> We need to prevent step 5 from happening.
>
> There are a couple of possible solutions to this problem:
>
> As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of using
> Pulp to roll back, you manually revert the commits using git, and upload a
> new version of the repository to Pulp as "version 3".  You then distribute
> version 3 instead of version 1.  You understand that if you were to publish
> and old version using Pulp, it would misbehave for clients that tried to
> pull / update instead of cloning.

In my opinion folks needing Pulp to track a git(-like) repo are
probably interested in more workflows than just the clone.

>
> As a Pulp admin / plugin writer / user, you know that the client for the
> content type will never try to pull or update, only clone.  Therefore it is
> not a problem for you and can be ignored.

The cloning might be equivalent of just snapshotting the tree at a
particular commit and just publishing a plain tar.gz w/o the git
structures.
Limiting but clean?

>
> As a Plugin writer, whenever you publish a new version of the git
> repository, you delete or invalidate every publication for previous versions
> for the distribution base path.  If a Pulp admin wants to roll back, they
> need to create a new Publication.  The Plugin knows to apply revert commits
> on top of the repository to keep history linear.
>
> But really we've just pushed the problem forwards.  What happens when you
> want to upload future versions?  Now history of the git repository in Pulp
> is different from the Pulp admin's git repo history
> This is only acceptable for content types where the history is immaterial to
> the content itself. Probably viable for Cargo, but probably not a Git
> content type.
>

Does it mean a publication directory git tree is built anew every time
a rollback happens?
So Pulp history and the original project history are meant to be different?
Can there be ever conflicts?

> As a Plugin writer, you ignore publications entirely.  You don't make it
> possible to do the wrong thing. You have something along the lines of a
> "PluginManagedDirectory" which core does not try to mess with.  If you want
> to implement rollback functionality, you do it through your own API where
> the side effects are more easily controlled and reasoned about.

+1 seems like the cleanest way to me

>
> I have doubts about whether Option 3 is viable - it seems like making it
> work reliably would be difficult.

I'd say option #1 and #3 are the same, #3 adding the complexity of
automating the rollback in Pulp,
option #2 and #4 are the same in the sense of Pulp staying away from
the incompatible workflow a content type has while providing a limited
functionality subset to the consumer. In addition, #4 allows for Pulp
service/host to provide both the Pulp-specific, limited functionality
as well as the incompatible, content-type specific workflows from a
"single" point. This might be a benefit to some folks.

Option #5: somehow make core Pulp (content versioning) compatible with
the Git model ;)

--
milan

>
> On Fri, May 25, 2018 at 5:05 PM, Brian Bouterse <bbouters at redhat.com> wrote:
>>
>> I think Pulp does have enough value proposition over a script-based
>> alternative to make it worthwile for all of those types of plugins. Here are
>> a few points I think about:
>>
>> * scalability. A common story users tell is that scripts work well up
>> until a point. Doing it for an entire organization, or when content comes
>> from many places, or with more than a few people involved in maintaining the
>> content, it becomes unmaintainable.
>>
>> * Stacks of content. Often a group of content goes together, but each
>> piece of content is updated separately. For instance with Ansible roles, you
>> may use many of them together to deploy something, but each role may receive
>> changes separately. I think of all this content together as a "stack".
>> Keeping everything up to date can be challenging. Managing that change with
>> scripts can be hard and fragile. Also the ability to rollback quickly an
>> confidently is something Pulp can offer.
>>
>> * Organizing content is easier. Having an API that you can use to organize
>> content is easier than doing lots and lots of git yourself or with scripts.
>>
>> * Tasking. Long running tasks (and a lot of them) can be unweildy, and
>> Pulp makes that very organized and run very well.
>>
>> * Static and vulnerability analysis. We're seeing interest in using
>> analysis projects like Clair (https://github.com/arminc/clair-scanner) to
>> scan content in Pulp. By bringing all the content into one place, and that
>> place having a tasking system that plugin writers can control how their
>> content can be analyzed continuously.
>>
>> Also +1 to jortel's idea. I think that's a great idea and exactly what we
>> need.
>>
>>
>> On Thu, May 24, 2018 at 1:33 PM, Jeff Ortel <jortel at redhat.com> wrote:
>>>
>>>
>>>
>>> On 05/17/2018 07:46 AM, Daniel Alley wrote:
>>>
>>> Some content types are not going to be compatible with the normal
>>> sync/publish/distribute Pulp workflows, and will need to be live API-only.
>>> To what degree should Pulp accomodate these use cases?
>>>
>>> Example:
>>>
>>> Pulp makes the assumptions that
>>>
>>> A) the metadata for a repository can be generated in its entirety by the
>>> known set of content in a RepositoryVersion, and
>>>
>>> B) the client wouldn't care if you point it at an older version of the
>>> same repository.
>>>
>>> Cargo, the package manager for the Rust programming language, expects the
>>> registry url to be a git repository.  When a user does a "cargo update",
>>> cargo essentially does a "git pull" to update a local copy of the registry.
>>>
>>> Both of those assumptions are false in this case. You cannot generate the
>>> git history just from the set of content, and you cannot "roll back" the
>>> state of the repository without either breaking it for clients, or adding
>>> new commits on top.
>>>
>>> A theoretical Pulp plugin that worked with Cargo would need to ignore
>>> almost all of the existing Pulp primitives and very little (if any) of the
>>> normal Pulp workflow could be used.
>>>
>>> Should Pulp attempt to cater to plugins like these?  What could Pulp do
>>> to provide a benefit for such plugins over writing something from scratch
>>> from the ground up?  To what extent would such plugins be able to integrate
>>> with the rest of Pulp, if at all?
>>>
>>>
>>> I think OSTree and Ansible plugins will be in the same boat as Cargo.  In
>>> the case of OSTree, libostree does the heavy lifting for sync and publishing
>>> and I suspect the same is true for Git based repositories.  We should
>>> consider way to best support distributing (serving) content in core for
>>> these content types.  I suspect this will mainly entail something in the
>>> content app and perhaps a new component of a Publication like
>>> PublishedDirectory that references an OSTree/Git repository created in
>>> /var/lib/pulp/published.  This may benefit Maven as well.
>>>
>>>
>>>
>>> We don't have to commit to anything pre-GA but it is a good thing to keep
>>> in mind.  I'm sure there are other content types out there (not just Cargo)
>>> which would face similar problems.  pulp_git was inquired about a few months
>>> ago, it seems like it would share a few of them.
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>