[Pulp-dev] Content types which are not compatible with the normal pulp workflow

Milan Kovacik mkovacik at redhat.com
Mon May 28 14:52:54 UTC 2018


Thanks for the explanation!

On Mon, May 28, 2018 at 4:17 PM, Daniel Alley <dalley at redhat.com> wrote:
>> Is that because of the rollback actually creates version #3 that's
>> "newer" but lacks the rolled-back commits?
>> So there are some "merge" conflict if folks, that cloned #2, want to
>> pull from version #3 but their branch contains a commit the origin
>> lacks now?
>> Or rather that the published bits of the version #2 doesn't exist
>> anymore at all?
>
>
> The first one.  It would be like if someone force-pushed to the git
> repository, removing the last couple of commits of history.  It's basically
> the same problem.
>
>> Does it mean a publication directory git tree is built anew every time
>> a rollback happens?
>
>
> What it would have to do is take the existing git tree and apply new commits
> on top to return the contents of the repository to the state you want to
> roll it back to.
>
>> So Pulp history and the original project history are meant to be
>> different?
>> Can there be ever conflicts?
>
>
> It's not that they're meant to be different, but I think it is an
> unavoidable problem if you want to do rollbacks in Pulp.
>
> The source git repository for the project, whether it's on github or the
> admin's machine, is separate from Pulp's copy. The second you add a commit
> to one and not the other (by doing rollback w/ linear git history from the
> client's perspective), the histories will diverge.  It's unavoidable, that's
> just how git works.  You can keep the content of the files in the repo
> identical but the history will never be equivalent again.

...impairing the usability of Pulp as the "master" repository

>
> Basically, it is mutually exclusive to have:
>
> * Pulp not be the "master" git repository e.g. the admin is syncing /
> uploading it from somewhere else
> * maintain linear git history
> * be able to do rollbacks in Pulp
> * keep identical git history between Pulp and the git repository being
> synced / uploaded into Pulp
>
> One of them has to give.

+1

I believe any content type/plug-in with its own idea of  content
versioning will have the same conflict.
Wrapping/translating from content-specific versioning scheme to Pulp
versioning scheme sounds like a headache even if Pulp supports a
non-linear history one day.

Let's forget about it and give the plug-in the ability to opt-out from
the core versioning scheme instead?

Cheers,
milan

>
>
> On Mon, May 28, 2018 at 8:01 AM, Milan Kovacik <mkovacik at redhat.com> wrote:
>>
>> On Sat, May 26, 2018 at 2:23 AM, Daniel Alley <dalley at redhat.com> wrote:
>> > @Brian
>> >
>> > I agree with a lot of those points, but I would say that we're not just
>> > competing against hodgepodge collections of "scripts", but also against
>> > writing small microservice-y Flask apps that only implement the API for
>> > one
>> > content type.
>> >
>> > Also, rollback is not something Pulp would necessarily be able to offer
>> > with
>> > respect to history-sensitive content and metadata, like git
>> > repositories, or
>> > the Cargo example I provided.  It's still something the plugin writer
>> > would
>> > have to implement themselves in this case.
>> >
>> > @Jeff
>> >
>> >> perhaps a new component of a Publication like PublishedDirectory that
>> >> references an OSTree/Git repository created in /var/lib/pulp/published.
>> >
>> >
>> > I like the idea generally, but I don't think it would be able to be a
>> > component of a Publication.  I think it would need to be an alternative
>> > to a
>> > Publication which fulfills a similar function.
>> >
>> > The fundamental problem is this scenario:
>> >
>> > You upload a git repository with a git repository plugin
>> > You publish and distribute version 1 of the git repository
>> > You publish and distribute version 2 of the git repository
>> > A client downloads the git repository
>> > You notice a problem and decide to roll back to version 1.  A
>> > publication of
>> > version 1 already exists, which you distribute.
>> > Clients have a broken git history.  New clients can download the old
>> > version
>> > but anyone who has already downloaded version 2 will not be able to roll
>> > back to version 1 by pulling from Pulp
>>
>> Just trying to understand the situation:
>> Is that because of the rollback actually creates version #3 that's
>> "newer" but lacks the rolled-back commits?
>> So there are some "merge" conflict if folks, that cloned #2, want to
>> pull from version #3 but their branch contains a commit the origin
>> lacks now?
>> Or rather that the published bits of the version #2 doesn't exist
>> anymore at all?
>>
>> >
>> > We need to prevent step 5 from happening.
>> >
>> > There are a couple of possible solutions to this problem:
>> >
>> > As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of
>> > using
>> > Pulp to roll back, you manually revert the commits using git, and upload
>> > a
>> > new version of the repository to Pulp as "version 3".  You then
>> > distribute
>> > version 3 instead of version 1.  You understand that if you were to
>> > publish
>> > and old version using Pulp, it would misbehave for clients that tried to
>> > pull / update instead of cloning.
>>
>> In my opinion folks needing Pulp to track a git(-like) repo are
>> probably interested in more workflows than just the clone.
>>
>> >
>> > As a Pulp admin / plugin writer / user, you know that the client for the
>> > content type will never try to pull or update, only clone.  Therefore it
>> > is
>> > not a problem for you and can be ignored.
>>
>> The cloning might be equivalent of just snapshotting the tree at a
>> particular commit and just publishing a plain tar.gz w/o the git
>> structures.
>> Limiting but clean?
>>
>> >
>> > As a Plugin writer, whenever you publish a new version of the git
>> > repository, you delete or invalidate every publication for previous
>> > versions
>> > for the distribution base path.  If a Pulp admin wants to roll back,
>> > they
>> > need to create a new Publication.  The Plugin knows to apply revert
>> > commits
>> > on top of the repository to keep history linear.
>> >
>> > But really we've just pushed the problem forwards.  What happens when
>> > you
>> > want to upload future versions?  Now history of the git repository in
>> > Pulp
>> > is different from the Pulp admin's git repo history
>> > This is only acceptable for content types where the history is
>> > immaterial to
>> > the content itself. Probably viable for Cargo, but probably not a Git
>> > content type.
>> >
>>
>> Does it mean a publication directory git tree is built anew every time
>> a rollback happens?
>> So Pulp history and the original project history are meant to be
>> different?
>> Can there be ever conflicts?
>>
>>
>> > As a Plugin writer, you ignore publications entirely.  You don't make it
>> > possible to do the wrong thing. You have something along the lines of a
>> > "PluginManagedDirectory" which core does not try to mess with.  If you
>> > want
>> > to implement rollback functionality, you do it through your own API
>> > where
>> > the side effects are more easily controlled and reasoned about.
>>
>> +1 seems like the cleanest way to me
>>
>> >
>> > I have doubts about whether Option 3 is viable - it seems like making it
>> > work reliably would be difficult.
>>
>> I'd say option #1 and #3 are the same, #3 adding the complexity of
>> automating the rollback in Pulp,
>> option #2 and #4 are the same in the sense of Pulp staying away from
>> the incompatible workflow a content type has while providing a limited
>> functionality subset to the consumer. In addition, #4 allows for Pulp
>> service/host to provide both the Pulp-specific, limited functionality
>> as well as the incompatible, content-type specific workflows from a
>> "single" point. This might be a benefit to some folks.
>>
>>
>> Option #5: somehow make core Pulp (content versioning) compatible with
>> the Git model ;)
>>
>> --
>> milan
>>
>> >
>> > On Fri, May 25, 2018 at 5:05 PM, Brian Bouterse <bbouters at redhat.com>
>> > wrote:
>> >>
>> >> I think Pulp does have enough value proposition over a script-based
>> >> alternative to make it worthwile for all of those types of plugins.
>> >> Here are
>> >> a few points I think about:
>> >>
>> >> * scalability. A common story users tell is that scripts work well up
>> >> until a point. Doing it for an entire organization, or when content
>> >> comes
>> >> from many places, or with more than a few people involved in
>> >> maintaining the
>> >> content, it becomes unmaintainable.
>> >>
>> >> * Stacks of content. Often a group of content goes together, but each
>> >> piece of content is updated separately. For instance with Ansible
>> >> roles, you
>> >> may use many of them together to deploy something, but each role may
>> >> receive
>> >> changes separately. I think of all this content together as a "stack".
>> >> Keeping everything up to date can be challenging. Managing that change
>> >> with
>> >> scripts can be hard and fragile. Also the ability to rollback quickly
>> >> an
>> >> confidently is something Pulp can offer.
>> >>
>> >> * Organizing content is easier. Having an API that you can use to
>> >> organize
>> >> content is easier than doing lots and lots of git yourself or with
>> >> scripts.
>> >>
>> >> * Tasking. Long running tasks (and a lot of them) can be unweildy, and
>> >> Pulp makes that very organized and run very well.
>> >>
>> >> * Static and vulnerability analysis. We're seeing interest in using
>> >> analysis projects like Clair (https://github.com/arminc/clair-scanner)
>> >> to
>> >> scan content in Pulp. By bringing all the content into one place, and
>> >> that
>> >> place having a tasking system that plugin writers can control how their
>> >> content can be analyzed continuously.
>> >>
>> >> Also +1 to jortel's idea. I think that's a great idea and exactly what
>> >> we
>> >> need.
>> >>
>> >>
>> >> On Thu, May 24, 2018 at 1:33 PM, Jeff Ortel <jortel at redhat.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On 05/17/2018 07:46 AM, Daniel Alley wrote:
>> >>>
>> >>> Some content types are not going to be compatible with the normal
>> >>> sync/publish/distribute Pulp workflows, and will need to be live
>> >>> API-only.
>> >>> To what degree should Pulp accomodate these use cases?
>> >>>
>> >>> Example:
>> >>>
>> >>> Pulp makes the assumptions that
>> >>>
>> >>> A) the metadata for a repository can be generated in its entirety by
>> >>> the
>> >>> known set of content in a RepositoryVersion, and
>> >>>
>> >>> B) the client wouldn't care if you point it at an older version of the
>> >>> same repository.
>> >>>
>> >>> Cargo, the package manager for the Rust programming language, expects
>> >>> the
>> >>> registry url to be a git repository.  When a user does a "cargo
>> >>> update",
>> >>> cargo essentially does a "git pull" to update a local copy of the
>> >>> registry.
>> >>>
>> >>> Both of those assumptions are false in this case. You cannot generate
>> >>> the
>> >>> git history just from the set of content, and you cannot "roll back"
>> >>> the
>> >>> state of the repository without either breaking it for clients, or
>> >>> adding
>> >>> new commits on top.
>> >>>
>> >>> A theoretical Pulp plugin that worked with Cargo would need to ignore
>> >>> almost all of the existing Pulp primitives and very little (if any) of
>> >>> the
>> >>> normal Pulp workflow could be used.
>> >>>
>> >>> Should Pulp attempt to cater to plugins like these?  What could Pulp
>> >>> do
>> >>> to provide a benefit for such plugins over writing something from
>> >>> scratch
>> >>> from the ground up?  To what extent would such plugins be able to
>> >>> integrate
>> >>> with the rest of Pulp, if at all?
>> >>>
>> >>>
>> >>> I think OSTree and Ansible plugins will be in the same boat as Cargo.
>> >>> In
>> >>> the case of OSTree, libostree does the heavy lifting for sync and
>> >>> publishing
>> >>> and I suspect the same is true for Git based repositories.  We should
>> >>> consider way to best support distributing (serving) content in core
>> >>> for
>> >>> these content types.  I suspect this will mainly entail something in
>> >>> the
>> >>> content app and perhaps a new component of a Publication like
>> >>> PublishedDirectory that references an OSTree/Git repository created in
>> >>> /var/lib/pulp/published.  This may benefit Maven as well.
>> >>>
>> >>>
>> >>>
>> >>> We don't have to commit to anything pre-GA but it is a good thing to
>> >>> keep
>> >>> in mind.  I'm sure there are other content types out there (not just
>> >>> Cargo)
>> >>> which would face similar problems.  pulp_git was inquired about a few
>> >>> months
>> >>> ago, it seems like it would share a few of them.
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pulp-dev mailing list
>> >>> Pulp-dev at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pulp-dev mailing list
>> >>> Pulp-dev at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pulp-dev mailing list
>> >> Pulp-dev at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>
>> >
>> >
>> > _______________________________________________
>> > Pulp-dev mailing list
>> > Pulp-dev at redhat.com
>> > https://www.redhat.com/mailman/listinfo/pulp-dev
>> >
>
>




More information about the Pulp-dev mailing list