[Pulp-dev] Pulp 3 plugin for Chef cookbooks
David Davis
daviddavis at redhat.com
Thu May 17 23:22:16 UTC 2018
On Thu, May 17, 2018 at 3:01 PM, Simon Baatz <gmbnomis at gmail.com> wrote:
> On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote:
> > This is great. I had a chance to look at your plugin and am really
> > excited by having chef support in Pulp.
> > Some responses below.
> > On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]gmbnomis at gmail.com>
> > wrote:
> >
> > I created the beginnings of a Pulp 3 plugin to manage Chef
> > cookbooks
> > [1]. Currently, it supports to create repos, create cookbook
> > content
> > units, and publish repos. A published & distributed repo will offer
> > a "universe" API endpoint for tools like Berkshelf.
> > I did not implement sync yet. I am waiting for "PendingVersion" to
> > be available
> > first.
> > I ran into a couple of problems/uncertainties described below (sorry
> > for the
> > lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so
> > any remarks or
> > suggestions are welcome:
> > - Create Content: The plugin reads as much meta-data as possible
> > from the actual
> > cookbook Artifact when creating a content unit. The motivation for
> > this is:
> > - One doesn't need a special tool to upload content, which makes
> > uploading by e.g.
> > a CI job easier.
> > - It ensures consistency between metadata stored in Pulp and the
> > actual
> > metadata in the cookbook.
> > However, this requires to extract a metadata file from a gzipped
> > tar archive.
> > Content unit creation is synchronous and doing this work in a
> > synchronous call
> > might not be optimal (we had a discussion in this topic on the
> > pulp-dev
> > mailing list already).
> >
> > I agree. Can you make this an async call in the plugin or is there a
> > need to also make this an async call in core/other plugins?
>
> Do you mean making the POST to the content/cookbook/ endpoint
> asynchronous? I have no idea how easy or hard this is, but wouldn't
> that look inconsistent (one plugin returns a direct 201 response,
> another a 202)?
>
It’s quite easy. You’ve got code in your plugin that handles async
publishes. I hear you about inconsistency and I don’t think it’s a problem
but maybe we should make the other plugins return 202s too? Interested to
hear others' thoughts on this.
>
> > - Publication/Distribution: The metadata file ("universe") for a
> > published
> > cookbook repository contains absolute URLs for download (i.e.
> > these point
> > to published artifacts in a distribution).
> > The current publication/distribution concept seems to have the
> > underlying
> > assumption that a Publication is fully relocatable:
> > PublishedMetadata
> > artifacts are created by the publishing task and creating a
> > Distribution is a
> > synchronous call that determines the base path of the published
> > artifacts.
> > This causes a problem with said "universe" file. Ideally, it could
> > be
> > pre-computed (it lists all artifacts in the repo). However, this
> > can't be
> > done AFAIK since the base path is unknown at publication time and
> > one can't
> > generate additional metadata artifacts for a specific distribution
> > later.
> > The best solution I came up with was to implement a dynamic API.
> > To reduce the
> > amount of work to be done, the API does a simple string
> > replacement: During
> > publication, the universe file is pre-computed using placeholders.
> > In the
> > dynamic API these placeholders are replaced with the actual base
> > URL of the
> > distribution.
> > However, I would prefer not to be forced to implement a dynamic
> > API for static
> > information. Is there a way to solve this differently?
> >
> > Are relative paths an option? If not, I don’t think there’s currently
> > an alternative to a live api. I probably wouldn’t even use a published
> > metadata file TBH. I wonder though if there’s maybe functionality we
> > could add to do this.
>
> Unfortunately, relative paths are not an option. Berkshelf just takes
> the URI as is and requests the content.
>
> Regarding the published metadata file: I am not sure what you are
> suggesting. Should the dynamic API be really dynamic? If someone is
> going to mirror the entire Chef Supermarket, this will result in a
> "universe" that contains around 22,000 cookbooks. Every time
> Berkshelf is called with "install" or "update" this resource will be
> requested. I don't think that it makes sense to generate this data
> dynamically (it basically corresponds to the "primary.xml" file in
> the rpm repo case).
>
> Or are you suggesting to store the pre-computed result differently?
> (I don't like the metadata file solution either. OTOH, this object
> has exactly the required life cycle and the publication logic
> takes care of it. And string.replace() should be pretty fast.)
>
> For this particular plugin, it would be nice to be able to associate
> metadata files with the distribution, not the publication. But that's
> probably a little bit too much to hope for...
>
I was suggesting a completely live api but I see your point about having a
lot of cookbooks and calling this endpoint repeatedly. I think the solution
you have works. I can’t think of how to better support your use case ATM.
>
> > - Content-Type Header: The "universe" file is JSON and must have a
> > corresponding
> > "Content-Type" HTTP header.
> > However, content type of the development server seems to be
> > "text/html" by
> > default for all artifacts. Apparently, I can't set the
> > content-type of a
> > (meta-data) artifact(?)
> >
> > I think this goes back to not using a published metadata file to serve
> > up your api. However, I could see how it might be useful.
>
> Sure, in my case it is no problem, since I set the content type
> in the dynamic API. The question is more generic, as content types
> should be correct for both artifacts and meta data files in general.
>
> In Pulp 2 it is determined based on the mime type associated with the
> file path (in the ContentView). How is this going to work in Pulp 3?
>
Maybe we can read the mime type of the artifact before we serve it?
>
> > - Getting the base url of a distribution in the dynamic API is
> > surprisingly
> > complicated and depends on the inner structure of pulp core (I
> > took the
> > implementation from 'pulp_ansible'). IMHO, a well defined way to
> > obtain it
> > should be part of the plugin API.
> >
> > I agree. Opened: [2]https://pulp.plan.io/issues/3677
> >
> > - "Content" class: The way to use only a single artifact in Content
> > (like done
> > in pulp_file) seems to require in-depth knowledge of the
> > Content/ContentSerializer class and its inner workings.
> > The downside of this can already be experienced in the "pulp_file"
> > plugin: The
> > fields "id" and "created" are missing, since the implementation
> > there just
> > overrides the 'fields' in the serializer).
> > I think two Content types should be part of the plugin API: one
> > with
> > multiple artifacts, and a simpler one with a single artifact
> >
> > I began working on this:
> > [3]https://github.com/pulp/pulp/pull/3476
> > But there was opposition around having the Content model be
> responsible
> > for generating the relative path on the Content Artifact. I’ve opened
> > an issue to see if there’s another way to do this (e.g. using
> > serializers):
> > [4]https://pulp.plan.io/issues/3678
>
> Makes sense. I will follow this.
>
> > - Uploading an Artifact that already exists returns an error, which
> > is
> > annoying if you use http/curl to import artifacts. Suppose some
> > other user
> > uploaded an artifact in the past. You won't get useful
> > information from the POST request uploading the same artifact:
> > HTTP/1.1 400 Bad Request
> > Allow: GET, POST, HEAD, OPTIONS
> > Content-Type: application/json
> > Date: Sat, 12 May 2018 17:50:54 GMT
> > Server: WSGIServer/0.2 CPython/3.6.2
> > Vary: Accept
> > {
> > "non_field_errors": [
> > "sha512 checksum must be unique."
> > ]
> > }
> > This forced me to do something like:
> > ...
> > sha256=$(sha256sum "$targz" | awk '{print $1}')
> > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s
> ha256=$sha256
> > | jq -r '.results[0]._href')
> > if [[ $ARTIFACT_HREF == "null" ]]; then
> > echo uploading artifact $cookbook_name sha256: $sha256
> > http --form POST [5]http://localhost:8000/pulp/api
> > /v3/artifacts/ file@$targz
> > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s
> > ha256=$sha256 | jq -r '.results[0]._href')
> > ...
> > Perhaps a "303 See Other" to the existing artifact would help
> > here.
> >
> > Why not just do something like:
> > http --form POST [6]http://localhost:8000/pulp/api/v3/artifacts/
> file@$
> > targz || true
> > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 | jq
> > -r '.results[0]._href’)
> > if [[ $ARTIFACT_HREF == “null” ]]; then exit 1; fi
> > The error message could be more helpful though. It should probably
> > contain the existing artifact’s href. I looked at 303 and am a bit
> > ambivalent toward using it.
>
> Sure, 303 is not really clear-cut. Including the href of the existing
> artifact should already help.
>
>
Opened https://pulp.plan.io/issues/3681
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180517/002048a6/attachment.htm>
More information about the Pulp-dev
mailing list