[Pulp-dev] Content types which are not compatible with the normal pulp workflow

Sat May 26 00:23:33 UTC 2018

@Brian

I agree with a lot of those points, but I would say that we're not just
competing against hodgepodge collections of "scripts", but also against
writing small microservice-y Flask apps that only implement the API for one
content type.

Also, rollback is not something Pulp would necessarily be able to offer
with respect to history-sensitive content and metadata, like git
repositories, or the Cargo example I provided.  It's still something the
plugin writer would have to implement themselves in this case.

@Jeff

perhaps a new component of a Publication like PublishedDirectory that
> references an OSTree/Git repository created in /var/lib/pulp/published.
>

I like the idea generally, but I don't think it would be able to be a
component of a Publication.  I think it would need to be an alternative to
a Publication which fulfills a similar function.

The fundamental problem is this scenario:

   1. You upload a git repository with a git repository plugin
   2. You publish and distribute version 1 of the git repository
   3. You publish and distribute version 2 of the git repository
   4. A client downloads the git repository
   5. You notice a problem and decide to roll back to version 1.  A
   publication of version 1 already exists, which you distribute.
   6. Clients have a broken git history.  New clients can download the old
   version but anyone who has already downloaded version 2 will not be able to
   roll back to version 1 by pulling from Pulp

We need to prevent step 5 from happening.

There are a couple of possible solutions to this problem:

   - As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of
   using Pulp to roll back, you manually revert the commits using git, and
   upload a new version of the repository to Pulp as "version 3".  You then
   distribute version 3 instead of version 1.  You understand that if you were
   to publish and old version using Pulp, it would misbehave for clients that
   tried to pull / update instead of cloning.

   - As a Pulp admin / plugin writer / user, you know that the client for
   the content type will never try to pull or update, only clone.  Therefore
   it is not a problem for you and can be ignored.

   - As a Plugin writer, whenever you publish a new version of the git
   repository, you delete or invalidate every publication for previous
   versions for the distribution base path.  If a Pulp admin wants to roll
   back, they need to create a new Publication.  The Plugin knows to apply
   revert commits on top of the repository to keep history linear.
   - But really we've just pushed the problem forwards.  What happens when
      you want to upload future versions?  Now history of the git repository in
      Pulp is different from the Pulp admin's git repo history
      - This is only acceptable for content types where the history is
      immaterial to the content itself. Probably viable for Cargo, but probably
      not a Git content type.

   - As a Plugin writer, you ignore publications entirely.  You don't make
   it possible to do the wrong thing. You have something along the lines of a
   "PluginManagedDirectory" which core does not try to mess with.  If you want
   to implement rollback functionality, you do it through your own API where
   the side effects are more easily controlled and reasoned about.

I have doubts about whether Option 3 is viable - it seems like making it
work reliably would be difficult.

On Fri, May 25, 2018 at 5:05 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> I think Pulp does have enough value proposition over a script-based
> alternative to make it worthwile for all of those types of plugins. Here
> are a few points I think about:
>
> * scalability. A common story users tell is that scripts work well up
> until a point. Doing it for an entire organization, or when content comes
> from many places, or with more than a few people involved in maintaining
> the content, it becomes unmaintainable.
>
> * Stacks of content. Often a group of content goes together, but each
> piece of content is updated separately. For instance with Ansible roles,
> you may use many of them together to deploy something, but each role may
> receive changes separately. I think of all this content together as a
> "stack". Keeping everything up to date can be challenging. Managing that
> change with scripts can be hard and fragile. Also the ability to rollback
> quickly an confidently is something Pulp can offer.
>
> * Organizing content is easier. Having an API that you can use to organize
> content is easier than doing lots and lots of git yourself or with scripts.
>
> * Tasking. Long running tasks (and a lot of them) can be unweildy, and
> Pulp makes that very organized and run very well.
>
> * Static and vulnerability analysis. We're seeing interest in using
> analysis projects like Clair (https://github.com/arminc/clair-scanner) to
> scan content in Pulp. By bringing all the content into one place, and that
> place having a tasking system that plugin writers can control how their
> content can be analyzed continuously.
>
> Also +1 to jortel's idea. I think that's a great idea and exactly what we
> need.
>
>
> On Thu, May 24, 2018 at 1:33 PM, Jeff Ortel <jortel at redhat.com> wrote:
>
>>
>>
>> On 05/17/2018 07:46 AM, Daniel Alley wrote:
>>
>> Some content types are not going to be compatible with the normal
>> sync/publish/distribute Pulp workflows, and will need to be live API-only.
>> To what degree should Pulp accomodate these use cases?
>>
>> Example:
>>
>> Pulp makes the assumptions that
>>
>> A) the metadata for a repository can be generated in its entirety by the
>> known set of content in a RepositoryVersion, and
>>
>> B) the client wouldn't care if you point it at an older version of the
>> same repository.
>>
>> Cargo, the package manager for the Rust programming language, expects the
>> registry url to be a git repository.  When a user does a "cargo update",
>> cargo essentially does a "git pull" to update a local copy of the registry.
>>
>> Both of those assumptions are false in this case. You cannot generate the
>> git history just from the set of content, and you cannot "roll back" the
>> state of the repository without either breaking it for clients, or adding
>> new commits on top.
>>
>> A theoretical Pulp plugin that worked with Cargo would need to ignore
>> almost all of the existing Pulp primitives and very little (if any) of the
>> normal Pulp workflow could be used.
>>
>> Should Pulp attempt to cater to plugins like these?  What could Pulp do
>> to provide a benefit for such plugins over writing something from scratch
>> from the ground up?  To what extent would such plugins be able to integrate
>> with the rest of Pulp, if at all?
>>
>>
>> I think OSTree and Ansible plugins will be in the same boat as Cargo.  In
>> the case of OSTree, libostree does the heavy lifting for sync and
>> publishing and I suspect the same is true for Git based repositories.  We
>> should consider way to best support distributing (serving) content in core
>> for these content types.  I suspect this will mainly entail something in
>> the content app and perhaps a new component of a Publication like
>> PublishedDirectory that references an OSTree/Git repository created in
>> /var/lib/pulp/published.  This may benefit Maven as well.
>>
>
>>
>> We don't have to commit to anything pre-GA but it is a good thing to keep
>> in mind.  I'm sure there are other content types out there (not just Cargo)
>> which would face similar problems.  pulp_git was inquired about a few
>> months ago, it seems like it would share a few of them.
>>
>>
>> _______________________________________________
>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180525/93182383/attachment.htm>