[Pulp-dev] Pulp 3 plugin for Chef cookbooks

Thu May 17 23:22:16 UTC 2018

On Thu, May 17, 2018 at 3:01 PM, Simon Baatz <gmbnomis at gmail.com> wrote:

> On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote:
> >    This is great. I had a chance to look at your plugin and am really
> >    excited by having chef support in Pulp.
> >    Some responses below.
> >    On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]gmbnomis at gmail.com>
> >    wrote:
> >
> >      I created the  beginnings of a Pulp 3 plugin to manage Chef
> >      cookbooks
> >      [1].  Currently, it supports to create repos, create cookbook
> >      content
> >      units, and publish repos.  A published & distributed repo will offer
> >      a "universe" API endpoint for tools like Berkshelf.
> >      I did not implement sync yet. I am waiting for "PendingVersion" to
> >      be available
> >      first.
> >      I ran into a couple of problems/uncertainties described below (sorry
> >      for the
> >      lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so
> >      any remarks or
> >      suggestions are welcome:
> >      - Create Content: The plugin reads as much meta-data as possible
> >      from the actual
> >        cookbook Artifact when creating a content unit. The motivation for
> >      this is:
> >        - One doesn't need a special tool to upload content, which makes
> >      uploading by e.g.
> >          a CI job easier.
> >        - It ensures consistency between metadata stored in Pulp and the
> >      actual
> >          metadata in the cookbook.
> >        However, this requires to extract a metadata file from a gzipped
> >      tar archive.
> >        Content unit creation is synchronous and doing this work in a
> >      synchronous call
> >        might not be optimal (we had a discussion in this topic on the
> >      pulp-dev
> >        mailing list already).
> >
> >    I agree. Can you make this an async call in the plugin or is there a
> >    need to also make this an async call in core/other plugins?
>
> Do you mean making the POST to the content/cookbook/ endpoint
> asynchronous?  I have no idea how easy or hard this is, but wouldn't
> that look inconsistent (one plugin returns a direct 201 response,
> another a 202)?
>

It’s quite easy. You’ve got code in your plugin that handles async
publishes. I hear you about inconsistency and I don’t think it’s a problem
but maybe we should make the other plugins return 202s too? Interested to
hear others' thoughts on this.

>
> >      - Publication/Distribution: The metadata file ("universe") for a
> >      published
> >        cookbook repository contains absolute URLs for download (i.e.
> >      these point
> >        to published artifacts in a distribution).
> >        The current publication/distribution concept seems to have the
> >      underlying
> >        assumption that a Publication is fully relocatable:
> >      PublishedMetadata
> >        artifacts are created by the publishing task and creating a
> >      Distribution is a
> >        synchronous call that determines the base path of the published
> >      artifacts.
> >        This causes a problem with said "universe" file. Ideally, it could
> >      be
> >        pre-computed (it lists all artifacts in the repo).  However, this
> >      can't be
> >        done AFAIK since the base path is unknown at publication time and
> >      one can't
> >        generate additional metadata artifacts for a specific distribution
> >      later.
> >        The best solution I came up with was to implement a dynamic API.
> >      To reduce the
> >        amount of work to be done, the API does a simple string
> >      replacement: During
> >        publication, the universe file is pre-computed using placeholders.
> >      In the
> >        dynamic API these placeholders are replaced with the actual base
> >      URL of the
> >        distribution.
> >        However, I would prefer not to be forced to implement a dynamic
> >      API for static
> >        information. Is there a way to solve this differently?
> >
> >    Are relative paths an option? If not, I don’t think there’s currently
> >    an alternative to a live api. I probably wouldn’t even use a published
> >    metadata file TBH. I wonder though if there’s maybe functionality we
> >    could add to do this.
>
> Unfortunately, relative paths are not an option. Berkshelf just takes
> the URI as is and requests the content.
>
> Regarding the published metadata file: I am not sure what you are
> suggesting.  Should the dynamic API be really dynamic?  If someone is
> going to mirror the entire Chef Supermarket, this will result in a
> "universe" that contains around 22,000 cookbooks.  Every time
> Berkshelf is called with "install" or "update" this resource will be
> requested.  I don't think that it makes sense to generate this data
> dynamically (it basically corresponds to the "primary.xml" file in
> the rpm repo case).
>
> Or are you suggesting to store the pre-computed result differently?
> (I don't like the metadata file solution either. OTOH, this object
> has exactly the required life cycle and the publication logic
> takes care of it. And string.replace() should be pretty fast.)
>
> For this particular plugin, it would be nice to be able to associate
> metadata files with the distribution, not the publication. But that's
> probably a little bit too much to hope for...
>

I was suggesting a completely live api but I see your point about having a
lot of cookbooks and calling this endpoint repeatedly. I think the solution
you have works. I can’t think of how to better support your use case ATM.

>
> >      - Content-Type Header: The "universe" file is JSON and must have a
> >      corresponding
> >        "Content-Type" HTTP header.
> >        However, content type of the development server seems to be
> >      "text/html" by
> >        default for all artifacts. Apparently, I can't set the
> >      content-type of a
> >        (meta-data) artifact(?)
> >
> >    I think this goes back to not using a published metadata file to serve
> >    up your api. However, I could see how it might be useful.
>
> Sure, in my case it is no problem, since I set the content type
> in the dynamic API. The question is more generic, as content types
> should be correct for both artifacts and meta data files in general.
>
> In Pulp 2 it is determined based on the mime type associated with the
> file path (in the ContentView).  How is this going to work in Pulp 3?
>

Maybe we can read the mime type of the artifact before we serve it?

>
> >      - Getting the base url of a distribution in the dynamic API is
> >      surprisingly
> >        complicated and depends on the inner structure of pulp core (I
> >      took the
> >        implementation from 'pulp_ansible'). IMHO, a well defined way to
> >      obtain it
> >        should be part of the plugin API.
> >
> >    I agree. Opened: [2]https://pulp.plan.io/issues/3677
> >
> >      - "Content" class: The way to use only a single artifact in Content
> >      (like done
> >        in pulp_file) seems to require in-depth knowledge of the
> >        Content/ContentSerializer class and its inner workings.
> >        The downside of this can already be experienced in the "pulp_file"
> >      plugin: The
> >        fields "id" and "created" are missing, since the implementation
> >      there just
> >        overrides the 'fields' in the serializer).
> >        I think two Content types should be part of the plugin API: one
> >      with
> >        multiple artifacts, and a simpler one with a single artifact
> >
> >    I began working on this:
> >    [3]https://github.com/pulp/pulp/pull/3476
> >    But there was opposition around having the Content model be
> responsible
> >    for generating the relative path on the Content Artifact. I’ve opened
> >    an issue to see if there’s another way to do this (e.g. using
> >    serializers):
> >    [4]https://pulp.plan.io/issues/3678
>
> Makes sense. I will follow this.
>
> >      - Uploading an Artifact that already exists returns an error, which
> >      is
> >        annoying if you use http/curl to import artifacts. Suppose some
> >      other user
> >        uploaded an artifact in the past. You won't get useful
> >        information from the POST request uploading the same artifact:
> >        HTTP/1.1 400 Bad Request
> >        Allow: GET, POST, HEAD, OPTIONS
> >        Content-Type: application/json
> >        Date: Sat, 12 May 2018 17:50:54 GMT
> >        Server: WSGIServer/0.2 CPython/3.6.2
> >        Vary: Accept
> >        {
> >            "non_field_errors": [
> >                "sha512 checksum must be unique."
> >            ]
> >        }
> >        This forced me to do something like:
> >          ...
> >          sha256=$(sha256sum "$targz" | awk '{print $1}')
> >          ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s
> ha256=$sha256
> >      | jq -r '.results[0]._href')
> >          if [[ $ARTIFACT_HREF == "null" ]]; then
> >              echo uploading artifact $cookbook_name sha256: $sha256
> >              http --form POST [5]http://localhost:8000/pulp/api
> >      /v3/artifacts/ file@$targz
> >              ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s
> >      ha256=$sha256 | jq -r '.results[0]._href')
> >              ...
> >        Perhaps a "303 See Other" to the existing artifact would help
> >      here.
> >
> >    Why not just do something like:
> >    http --form POST [6]http://localhost:8000/pulp/api/v3/artifacts/
> file@$
> >    targz || true
> >    ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 | jq
> >    -r '.results[0]._href’)
> >    if [[ $ARTIFACT_HREF == “null” ]]; then exit 1; fi
> >    The error message could be more helpful though. It should probably
> >    contain the existing artifact’s href. I looked at 303 and am a bit
> >    ambivalent toward using it.
>
> Sure, 303 is not really clear-cut. Including the href of the existing
> artifact should already help.
>
>
Opened https://pulp.plan.io/issues/3681
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180517/002048a6/attachment.htm>