[Pulp-dev] Content types which are not compatible with the normal pulp workflow
Daniel Alley
dalley at redhat.com
Mon May 28 14:17:08 UTC 2018
>
> Is that because of the rollback actually creates version #3 that's
> "newer" but lacks the rolled-back commits?
> So there are some "merge" conflict if folks, that cloned #2, want to
> pull from version #3 but their branch contains a commit the origin
> lacks now?
> Or rather that the published bits of the version #2 doesn't exist
> anymore at all?
The first one. It would be like if someone force-pushed to the git
repository, removing the last couple of commits of history. It's basically
the same problem.
Does it mean a publication directory git tree is built anew every time
> a rollback happens?
>
What it would have to do is take the existing git tree and apply new
commits on top to return the contents of the repository to the state you
want to roll it back to.
So Pulp history and the original project history are meant to be different?
> Can there be ever conflicts?
>
It's not that they're *meant* to be different, but I think it is an
unavoidable problem if you want to do rollbacks in Pulp.
The source git repository for the project, whether it's on github or the
admin's machine, is separate from Pulp's copy. The second you add a commit
to one and not the other (by doing rollback w/ linear git history from the
client's perspective), the histories will diverge. It's unavoidable,
that's just how git works. You can keep the content of the files in the
repo identical but the history will never be equivalent again.
Basically, it is mutually exclusive to have:
* Pulp not be the "master" git repository e.g. the admin is syncing /
uploading it from somewhere else
* maintain linear git history
* be able to do rollbacks in Pulp
* keep identical git history between Pulp and the git repository being
synced / uploaded into Pulp
One of them has to give.
On Mon, May 28, 2018 at 8:01 AM, Milan Kovacik <mkovacik at redhat.com> wrote:
> On Sat, May 26, 2018 at 2:23 AM, Daniel Alley <dalley at redhat.com> wrote:
> > @Brian
> >
> > I agree with a lot of those points, but I would say that we're not just
> > competing against hodgepodge collections of "scripts", but also against
> > writing small microservice-y Flask apps that only implement the API for
> one
> > content type.
> >
> > Also, rollback is not something Pulp would necessarily be able to offer
> with
> > respect to history-sensitive content and metadata, like git
> repositories, or
> > the Cargo example I provided. It's still something the plugin writer
> would
> > have to implement themselves in this case.
> >
> > @Jeff
> >
> >> perhaps a new component of a Publication like PublishedDirectory that
> >> references an OSTree/Git repository created in /var/lib/pulp/published.
> >
> >
> > I like the idea generally, but I don't think it would be able to be a
> > component of a Publication. I think it would need to be an alternative
> to a
> > Publication which fulfills a similar function.
> >
> > The fundamental problem is this scenario:
> >
> > You upload a git repository with a git repository plugin
> > You publish and distribute version 1 of the git repository
> > You publish and distribute version 2 of the git repository
> > A client downloads the git repository
> > You notice a problem and decide to roll back to version 1. A
> publication of
> > version 1 already exists, which you distribute.
> > Clients have a broken git history. New clients can download the old
> version
> > but anyone who has already downloaded version 2 will not be able to roll
> > back to version 1 by pulling from Pulp
>
> Just trying to understand the situation:
> Is that because of the rollback actually creates version #3 that's
> "newer" but lacks the rolled-back commits?
> So there are some "merge" conflict if folks, that cloned #2, want to
> pull from version #3 but their branch contains a commit the origin
> lacks now?
> Or rather that the published bits of the version #2 doesn't exist
> anymore at all?
>
> >
> > We need to prevent step 5 from happening.
> >
> > There are a couple of possible solutions to this problem:
> >
> > As a Pulp admin, you ignore Pulp's rollback functionality. Instead of
> using
> > Pulp to roll back, you manually revert the commits using git, and upload
> a
> > new version of the repository to Pulp as "version 3". You then
> distribute
> > version 3 instead of version 1. You understand that if you were to
> publish
> > and old version using Pulp, it would misbehave for clients that tried to
> > pull / update instead of cloning.
>
> In my opinion folks needing Pulp to track a git(-like) repo are
> probably interested in more workflows than just the clone.
>
> >
> > As a Pulp admin / plugin writer / user, you know that the client for the
> > content type will never try to pull or update, only clone. Therefore it
> is
> > not a problem for you and can be ignored.
>
> The cloning might be equivalent of just snapshotting the tree at a
> particular commit and just publishing a plain tar.gz w/o the git
> structures.
> Limiting but clean?
>
> >
> > As a Plugin writer, whenever you publish a new version of the git
> > repository, you delete or invalidate every publication for previous
> versions
> > for the distribution base path. If a Pulp admin wants to roll back, they
> > need to create a new Publication. The Plugin knows to apply revert
> commits
> > on top of the repository to keep history linear.
> >
> > But really we've just pushed the problem forwards. What happens when you
> > want to upload future versions? Now history of the git repository in
> Pulp
> > is different from the Pulp admin's git repo history
> > This is only acceptable for content types where the history is
> immaterial to
> > the content itself. Probably viable for Cargo, but probably not a Git
> > content type.
> >
>
> Does it mean a publication directory git tree is built anew every time
> a rollback happens?
> So Pulp history and the original project history are meant to be different?
> Can there be ever conflicts?
>
>
> > As a Plugin writer, you ignore publications entirely. You don't make it
> > possible to do the wrong thing. You have something along the lines of a
> > "PluginManagedDirectory" which core does not try to mess with. If you
> want
> > to implement rollback functionality, you do it through your own API where
> > the side effects are more easily controlled and reasoned about.
>
> +1 seems like the cleanest way to me
>
> >
> > I have doubts about whether Option 3 is viable - it seems like making it
> > work reliably would be difficult.
>
> I'd say option #1 and #3 are the same, #3 adding the complexity of
> automating the rollback in Pulp,
> option #2 and #4 are the same in the sense of Pulp staying away from
> the incompatible workflow a content type has while providing a limited
> functionality subset to the consumer. In addition, #4 allows for Pulp
> service/host to provide both the Pulp-specific, limited functionality
> as well as the incompatible, content-type specific workflows from a
> "single" point. This might be a benefit to some folks.
>
>
> Option #5: somehow make core Pulp (content versioning) compatible with
> the Git model ;)
>
> --
> milan
>
> >
> > On Fri, May 25, 2018 at 5:05 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
> >>
> >> I think Pulp does have enough value proposition over a script-based
> >> alternative to make it worthwile for all of those types of plugins.
> Here are
> >> a few points I think about:
> >>
> >> * scalability. A common story users tell is that scripts work well up
> >> until a point. Doing it for an entire organization, or when content
> comes
> >> from many places, or with more than a few people involved in
> maintaining the
> >> content, it becomes unmaintainable.
> >>
> >> * Stacks of content. Often a group of content goes together, but each
> >> piece of content is updated separately. For instance with Ansible
> roles, you
> >> may use many of them together to deploy something, but each role may
> receive
> >> changes separately. I think of all this content together as a "stack".
> >> Keeping everything up to date can be challenging. Managing that change
> with
> >> scripts can be hard and fragile. Also the ability to rollback quickly an
> >> confidently is something Pulp can offer.
> >>
> >> * Organizing content is easier. Having an API that you can use to
> organize
> >> content is easier than doing lots and lots of git yourself or with
> scripts.
> >>
> >> * Tasking. Long running tasks (and a lot of them) can be unweildy, and
> >> Pulp makes that very organized and run very well.
> >>
> >> * Static and vulnerability analysis. We're seeing interest in using
> >> analysis projects like Clair (https://github.com/arminc/clair-scanner)
> to
> >> scan content in Pulp. By bringing all the content into one place, and
> that
> >> place having a tasking system that plugin writers can control how their
> >> content can be analyzed continuously.
> >>
> >> Also +1 to jortel's idea. I think that's a great idea and exactly what
> we
> >> need.
> >>
> >>
> >> On Thu, May 24, 2018 at 1:33 PM, Jeff Ortel <jortel at redhat.com> wrote:
> >>>
> >>>
> >>>
> >>> On 05/17/2018 07:46 AM, Daniel Alley wrote:
> >>>
> >>> Some content types are not going to be compatible with the normal
> >>> sync/publish/distribute Pulp workflows, and will need to be live
> API-only.
> >>> To what degree should Pulp accomodate these use cases?
> >>>
> >>> Example:
> >>>
> >>> Pulp makes the assumptions that
> >>>
> >>> A) the metadata for a repository can be generated in its entirety by
> the
> >>> known set of content in a RepositoryVersion, and
> >>>
> >>> B) the client wouldn't care if you point it at an older version of the
> >>> same repository.
> >>>
> >>> Cargo, the package manager for the Rust programming language, expects
> the
> >>> registry url to be a git repository. When a user does a "cargo
> update",
> >>> cargo essentially does a "git pull" to update a local copy of the
> registry.
> >>>
> >>> Both of those assumptions are false in this case. You cannot generate
> the
> >>> git history just from the set of content, and you cannot "roll back"
> the
> >>> state of the repository without either breaking it for clients, or
> adding
> >>> new commits on top.
> >>>
> >>> A theoretical Pulp plugin that worked with Cargo would need to ignore
> >>> almost all of the existing Pulp primitives and very little (if any) of
> the
> >>> normal Pulp workflow could be used.
> >>>
> >>> Should Pulp attempt to cater to plugins like these? What could Pulp do
> >>> to provide a benefit for such plugins over writing something from
> scratch
> >>> from the ground up? To what extent would such plugins be able to
> integrate
> >>> with the rest of Pulp, if at all?
> >>>
> >>>
> >>> I think OSTree and Ansible plugins will be in the same boat as Cargo.
> In
> >>> the case of OSTree, libostree does the heavy lifting for sync and
> publishing
> >>> and I suspect the same is true for Git based repositories. We should
> >>> consider way to best support distributing (serving) content in core for
> >>> these content types. I suspect this will mainly entail something in
> the
> >>> content app and perhaps a new component of a Publication like
> >>> PublishedDirectory that references an OSTree/Git repository created in
> >>> /var/lib/pulp/published. This may benefit Maven as well.
> >>>
> >>>
> >>>
> >>> We don't have to commit to anything pre-GA but it is a good thing to
> keep
> >>> in mind. I'm sure there are other content types out there (not just
> Cargo)
> >>> which would face similar problems. pulp_git was inquired about a few
> months
> >>> ago, it seems like it would share a few of them.
> >>>
> >>>
> >>> _______________________________________________
> >>> Pulp-dev mailing list
> >>> Pulp-dev at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/pulp-dev
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pulp-dev mailing list
> >>> Pulp-dev at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/pulp-dev
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pulp-dev mailing list
> >> Pulp-dev at redhat.com
> >> https://www.redhat.com/mailman/listinfo/pulp-dev
> >>
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180528/dd8519ec/attachment.htm>
More information about the Pulp-dev
mailing list