<div dir="ltr">On Thu, May 17, 2018 at 3:01 PM, Simon Baatz <<a href="mailto:gmbnomis@gmail.com" target="_blank">gmbnomis@gmail.com</a>> wrote: <div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote: > This is great. I had a chance to look at your plugin and am really > excited by having chef support in Pulp. > Some responses below. > On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]<a href="mailto:gmbnomis@gmail.com" target="_blank">gmbnomis@gmail.com</a>> <div><div class="m_-7462805737116494279gmail-m_6888732290980131716h5">> wrote: > > I created the beginnings of a Pulp 3 plugin to manage Chef > cookbooks > [1]. Currently, it supports to create repos, create cookbook > content > units, and publish repos. A published & distributed repo will offer > a "universe" API endpoint for tools like Berkshelf. > I did not implement sync yet. I am waiting for "PendingVersion" to > be available > first. > I ran into a couple of problems/uncertainties described below (sorry > for the > lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so > any remarks or > suggestions are welcome: > - Create Content: The plugin reads as much meta-data as possible > from the actual > cookbook Artifact when creating a content unit. The motivation for > this is: > - One doesn't need a special tool to upload content, which makes > uploading by e.g. > a CI job easier. > - It ensures consistency between metadata stored in Pulp and the > actual > metadata in the cookbook. > However, this requires to extract a metadata file from a gzipped > tar archive. > Content unit creation is synchronous and doing this work in a > synchronous call > might not be optimal (we had a discussion in this topic on the > pulp-dev > mailing list already). > > I agree. Can you make this an async call in the plugin or is there a > need to also make this an async call in core/other plugins? </div></div>Do you mean making the POST to the content/cookbook/ endpoint asynchronous? I have no idea how easy or hard this is, but wouldn't that look inconsistent (one plugin returns a direct 201 response, another a 202)? </blockquote><div> </div><div>It’s quite easy. You’ve got code in your plugin that handles async publishes. I hear you about inconsistency and I don’t think it’s a problem but maybe we should make the other plugins return 202s too? Interested to hear others' thoughts on this.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> <div><div class="m_-7462805737116494279gmail-m_6888732290980131716h5"> > - Publication/Distribution: The metadata file ("universe") for a > published > cookbook repository contains absolute URLs for download (i.e. > these point > to published artifacts in a distribution). > The current publication/distribution concept seems to have the > underlying > assumption that a Publication is fully relocatable: > PublishedMetadata > artifacts are created by the publishing task and creating a > Distribution is a > synchronous call that determines the base path of the published > artifacts. > This causes a problem with said "universe" file. Ideally, it could > be > pre-computed (it lists all artifacts in the repo). However, this > can't be > done AFAIK since the base path is unknown at publication time and > one can't > generate additional metadata artifacts for a specific distribution > later. > The best solution I came up with was to implement a dynamic API. > To reduce the > amount of work to be done, the API does a simple string > replacement: During > publication, the universe file is pre-computed using placeholders. > In the > dynamic API these placeholders are replaced with the actual base > URL of the > distribution. > However, I would prefer not to be forced to implement a dynamic > API for static > information. Is there a way to solve this differently? > > Are relative paths an option? If not, I don’t think there’s currently > an alternative to a live api. I probably wouldn’t even use a published > metadata file TBH. I wonder though if there’s maybe functionality we > could add to do this. </div></div>Unfortunately, relative paths are not an option. Berkshelf just takes the URI as is and requests the content. Regarding the published metadata file: I am not sure what you are suggesting. Should the dynamic API be really dynamic? If someone is going to mirror the entire Chef Supermarket, this will result in a "universe" that contains around 22,000 cookbooks. Every time Berkshelf is called with "install" or "update" this resource will be requested. I don't think that it makes sense to generate this data dynamically (it basically corresponds to the "primary.xml" file in the rpm repo case). Or are you suggesting to store the pre-computed result differently? (I don't like the metadata file solution either. OTOH, this object has exactly the required life cycle and the publication logic takes care of it. And string.replace() should be pretty fast.) For this particular plugin, it would be nice to be able to associate metadata files with the distribution, not the publication. But that's probably a little bit too much to hope for... </blockquote><div> </div><div>I was suggesting a completely live api but I see your point about having a lot of cookbooks and calling this endpoint repeatedly. I think the solution you have works. I can’t think of how to better support your use case ATM.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> > - Content-Type Header: The "universe" file is JSON and must have a > corresponding > "Content-Type" HTTP header. > However, content type of the development server seems to be > "text/html" by > default for all artifacts. Apparently, I can't set the > content-type of a > (meta-data) artifact(?) > > I think this goes back to not using a published metadata file to serve > up your api. However, I could see how it might be useful. Sure, in my case it is no problem, since I set the content type in the dynamic API. The question is more generic, as content types should be correct for both artifacts and meta data files in general. In Pulp 2 it is determined based on the mime type associated with the file path (in the ContentView). How is this going to work in Pulp 3? </blockquote><div> </div><div>Maybe we can read the mime type of the artifact before we serve it?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> > - Getting the base url of a distribution in the dynamic API is > surprisingly > complicated and depends on the inner structure of pulp core (I > took the > implementation from 'pulp_ansible'). IMHO, a well defined way to > obtain it > should be part of the plugin API. > > I agree. Opened: [2]<a href="https://pulp.plan.io/issues/3677" rel="noreferrer" target="_blank">https://pulp.plan.io/issues/3677</a> > > - "Content" class: The way to use only a single artifact in Content > (like done > in pulp_file) seems to require in-depth knowledge of the > Content/ContentSerializer class and its inner workings. > The downside of this can already be experienced in the "pulp_file" > plugin: The > fields "id" and "created" are missing, since the implementation > there just > overrides the 'fields' in the serializer). > I think two Content types should be part of the plugin API: one > with > multiple artifacts, and a simpler one with a single artifact > > I began working on this: > [3]<a href="https://github.com/pulp/pulp/pull/3476" rel="noreferrer" target="_blank">https://github.com/pulp/pulp/pull/3476</a> > But there was opposition around having the Content model be responsible > for generating the relative path on the Content Artifact. I’ve opened > an issue to see if there’s another way to do this (e.g. using > serializers): > [4]<a href="https://pulp.plan.io/issues/3678" rel="noreferrer" target="_blank">https://pulp.plan.io/issues/3678</a> Makes sense. I will follow this. > - Uploading an Artifact that already exists returns an error, which > is > annoying if you use http/curl to import artifacts. Suppose some > other user > uploaded an artifact in the past. You won't get useful > information from the POST request uploading the same artifact: > HTTP/1.1 400 Bad Request > Allow: GET, POST, HEAD, OPTIONS > Content-Type: application/json > Date: Sat, 12 May 2018 17:50:54 GMT > Server: WSGIServer/0.2 CPython/3.6.2 > Vary: Accept > { > "non_field_errors": [ > "sha512 checksum must be unique." > ] > } > This forced me to do something like: > ... > sha256=$(sha256sum "$targz" | awk '{print $1}') > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 > | jq -r '.results[0]._href') > if [[ $ARTIFACT_HREF == "null" ]]; then > echo uploading artifact $cookbook_name sha256: $sha256 > http --form POST [5]<a href="http://localhost:8000/pulp/api" rel="noreferrer" target="_blank">http://localhost:8000/pulp/api</a> > /v3/artifacts/ file@$targz > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s > ha256=$sha256 | jq -r '.results[0]._href') > ... > Perhaps a "303 See Other" to the existing artifact would help > here. > > Why not just do something like: > http --form POST [6]<a href="http://localhost:8000/pulp/api/v3/artifacts/" rel="noreferrer" target="_blank">http://localhost:8000/pulp/api/v3/artifacts/</a> file@$ > targz || true > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 | jq > -r '.results[0]._href’) > if [[ $ARTIFACT_HREF == “null” ]]; then exit 1; fi > The error message could be more helpful though. It should probably > contain the existing artifact’s href. I looked at 303 and am a bit > ambivalent toward using it. Sure, 303 is not really clear-cut. Including the href of the existing artifact should already help. </blockquote></div><div class="gmail_extra"> </div>Opened <a href="https://pulp.plan.io/issues/3681" target="_blank">https://pulp.plan.io/issues/3681</a></div></div>