[Pulp-dev] Pulp 3 plugin for Chef cookbooks

Thu May 17 19:01:33 UTC 2018

On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote:
>    This is great. I had a chance to look at your plugin and am really
>    excited by having chef support in Pulp.
>    Some responses below.
>    On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]gmbnomis at gmail.com>
>    wrote:
> 
>      I created the  beginnings of a Pulp 3 plugin to manage Chef
>      cookbooks
>      [1].  Currently, it supports to create repos, create cookbook
>      content
>      units, and publish repos.  A published & distributed repo will offer
>      a "universe" API endpoint for tools like Berkshelf.
>      I did not implement sync yet. I am waiting for "PendingVersion" to
>      be available
>      first.
>      I ran into a couple of problems/uncertainties described below (sorry
>      for the
>      lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so
>      any remarks or
>      suggestions are welcome:
>      - Create Content: The plugin reads as much meta-data as possible
>      from the actual
>        cookbook Artifact when creating a content unit. The motivation for
>      this is:
>        - One doesn't need a special tool to upload content, which makes
>      uploading by e.g.
>          a CI job easier.
>        - It ensures consistency between metadata stored in Pulp and the
>      actual
>          metadata in the cookbook.
>        However, this requires to extract a metadata file from a gzipped
>      tar archive.
>        Content unit creation is synchronous and doing this work in a
>      synchronous call
>        might not be optimal (we had a discussion in this topic on the
>      pulp-dev
>        mailing list already).
> 
>    I agree. Can you make this an async call in the plugin or is there a
>    need to also make this an async call in core/other plugins?

Do you mean making the POST to the content/cookbook/ endpoint
asynchronous?  I have no idea how easy or hard this is, but wouldn't
that look inconsistent (one plugin returns a direct 201 response,
another a 202)?

>      - Publication/Distribution: The metadata file ("universe") for a
>      published
>        cookbook repository contains absolute URLs for download (i.e.
>      these point
>        to published artifacts in a distribution).
>        The current publication/distribution concept seems to have the
>      underlying
>        assumption that a Publication is fully relocatable:
>      PublishedMetadata
>        artifacts are created by the publishing task and creating a
>      Distribution is a
>        synchronous call that determines the base path of the published
>      artifacts.
>        This causes a problem with said "universe" file. Ideally, it could
>      be
>        pre-computed (it lists all artifacts in the repo).  However, this
>      can't be
>        done AFAIK since the base path is unknown at publication time and
>      one can't
>        generate additional metadata artifacts for a specific distribution
>      later.
>        The best solution I came up with was to implement a dynamic API.
>      To reduce the
>        amount of work to be done, the API does a simple string
>      replacement: During
>        publication, the universe file is pre-computed using placeholders.
>      In the
>        dynamic API these placeholders are replaced with the actual base
>      URL of the
>        distribution.
>        However, I would prefer not to be forced to implement a dynamic
>      API for static
>        information. Is there a way to solve this differently?
> 
>    Are relative paths an option? If not, I donâ€™t think thereâ€™s currently
>    an alternative to a live api. I probably wouldnâ€™t even use a published
>    metadata file TBH. I wonder though if thereâ€™s maybe functionality we
>    could add to do this.

Unfortunately, relative paths are not an option. Berkshelf just takes
the URI as is and requests the content.

Regarding the published metadata file: I am not sure what you are
suggesting.  Should the dynamic API be really dynamic?  If someone is
going to mirror the entire Chef Supermarket, this will result in a
"universe" that contains around 22,000 cookbooks.  Every time
Berkshelf is called with "install" or "update" this resource will be
requested.  I don't think that it makes sense to generate this data
dynamically (it basically corresponds to the "primary.xml" file in
the rpm repo case).

Or are you suggesting to store the pre-computed result differently?
(I don't like the metadata file solution either. OTOH, this object
has exactly the required life cycle and the publication logic
takes care of it. And string.replace() should be pretty fast.)

For this particular plugin, it would be nice to be able to associate
metadata files with the distribution, not the publication. But that's
probably a little bit too much to hope for...

>      - Content-Type Header: The "universe" file is JSON and must have a
>      corresponding
>        "Content-Type" HTTP header.
>        However, content type of the development server seems to be
>      "text/html" by
>        default for all artifacts. Apparently, I can't set the
>      content-type of a
>        (meta-data) artifact(?)
> 
>    I think this goes back to not using a published metadata file to serve
>    up your api. However, I could see how it might be useful.

Sure, in my case it is no problem, since I set the content type
in the dynamic API. The question is more generic, as content types
should be correct for both artifacts and meta data files in general. 

In Pulp 2 it is determined based on the mime type associated with the
file path (in the ContentView).  How is this going to work in Pulp 3?

>      - Getting the base url of a distribution in the dynamic API is
>      surprisingly
>        complicated and depends on the inner structure of pulp core (I
>      took the
>        implementation from 'pulp_ansible'). IMHO, a well defined way to
>      obtain it
>        should be part of the plugin API.
> 
>    I agree. Opened: [2]https://pulp.plan.io/issues/3677
> 
>      - "Content" class: The way to use only a single artifact in Content
>      (like done
>        in pulp_file) seems to require in-depth knowledge of the
>        Content/ContentSerializer class and its inner workings.
>        The downside of this can already be experienced in the "pulp_file"
>      plugin: The
>        fields "id" and "created" are missing, since the implementation
>      there just
>        overrides the 'fields' in the serializer).
>        I think two Content types should be part of the plugin API: one
>      with
>        multiple artifacts, and a simpler one with a single artifact
> 
>    I began working on this:
>    [3]https://github.com/pulp/pulp/pull/3476
>    But there was opposition around having the Content model be responsible
>    for generating the relative path on the Content Artifact. Iâ€™ve opened
>    an issue to see if thereâ€™s another way to do this (e.g. using
>    serializers):
>    [4]https://pulp.plan.io/issues/3678

Makes sense. I will follow this.

>      - Uploading an Artifact that already exists returns an error, which
>      is
>        annoying if you use http/curl to import artifacts. Suppose some
>      other user
>        uploaded an artifact in the past. You won't get useful
>        information from the POST request uploading the same artifact:
>        HTTP/1.1 400 Bad Request
>        Allow: GET, POST, HEAD, OPTIONS
>        Content-Type: application/json
>        Date: Sat, 12 May 2018 17:50:54 GMT
>        Server: WSGIServer/0.2 CPython/3.6.2
>        Vary: Accept
>        {
>            "non_field_errors": [
>                "sha512 checksum must be unique."
>            ]
>        }
>        This forced me to do something like:
>          ...
>          sha256=$(sha256sum "$targz" | awk '{print $1}')
>          ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256
>      | jq -r '.results[0]._href')
>          if [[ $ARTIFACT_HREF == "null" ]]; then
>              echo uploading artifact $cookbook_name sha256: $sha256
>              http --form POST [5]http://localhost:8000/pulp/api
>      /v3/artifacts/ file@$targz
>              ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s
>      ha256=$sha256 | jq -r '.results[0]._href')
>              ...
>        Perhaps a "303 See Other" to the existing artifact would help
>      here.
> 
>    Why not just do something like:
>    http --form POST [6]http://localhost:8000/pulp/api/v3/artifacts/ file@$
>    targz || true
>    ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 | jq
>    -r '.results[0]._hrefâ€™)
>    if [[ $ARTIFACT_HREF == â€œnullâ€ ]]; then exit 1; fi
>    The error message could be more helpful though. It should probably
>    contain the existing artifactâ€™s href. I looked at 303 and am a bit
>    ambivalent toward using it.

Sure, 303 is not really clear-cut. Including the href of the existing
artifact should already help.