<div dir="ltr">On Thu, May 17, 2018 at 3:01 PM, Simon Baatz <span dir="ltr"><<a href="mailto:gmbnomis@gmail.com" target="_blank">gmbnomis@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span>On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote:<br>
> This is great. I had a chance to look at your plugin and am really<br>
> excited by having chef support in Pulp.<br>
> Some responses below.<br>
</span>> On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]<a href="mailto:gmbnomis@gmail.com" target="_blank">gmbnomis@gmail.com</a>><br>
<div><div class="m_-7462805737116494279gmail-m_6888732290980131716h5">> wrote:<br>
> <br>
> I created the beginnings of a Pulp 3 plugin to manage Chef<br>
> cookbooks<br>
> [1]. Currently, it supports to create repos, create cookbook<br>
> content<br>
> units, and publish repos. A published & distributed repo will offer<br>
> a "universe" API endpoint for tools like Berkshelf.<br>
> I did not implement sync yet. I am waiting for "PendingVersion" to<br>
> be available<br>
> first.<br>
> I ran into a couple of problems/uncertainties described below (sorry<br>
> for the<br>
> lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so<br>
> any remarks or<br>
> suggestions are welcome:<br>
> - Create Content: The plugin reads as much meta-data as possible<br>
> from the actual<br>
> cookbook Artifact when creating a content unit. The motivation for<br>
> this is:<br>
> - One doesn't need a special tool to upload content, which makes<br>
> uploading by e.g.<br>
> a CI job easier.<br>
> - It ensures consistency between metadata stored in Pulp and the<br>
> actual<br>
> metadata in the cookbook.<br>
> However, this requires to extract a metadata file from a gzipped<br>
> tar archive.<br>
> Content unit creation is synchronous and doing this work in a<br>
> synchronous call<br>
> might not be optimal (we had a discussion in this topic on the<br>
> pulp-dev<br>
> mailing list already).<br>
> <br>
> I agree. Can you make this an async call in the plugin or is there a<br>
> need to also make this an async call in core/other plugins?<br>
<br>
</div></div>Do you mean making the POST to the content/cookbook/ endpoint<br>
asynchronous? I have no idea how easy or hard this is, but wouldn't<br>
that look inconsistent (one plugin returns a direct 201 response,<br>
another a 202)?<br></blockquote><div><br></div><div>It’s quite easy. You’ve got code in your plugin that handles async publishes. I hear you about inconsistency and I don’t think it’s a problem but maybe we should make the other plugins return 202s too? Interested to hear others' thoughts on this.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div><div class="m_-7462805737116494279gmail-m_6888732290980131716h5"> <br>
> - Publication/Distribution: The metadata file ("universe") for a<br>
> published<br>
> cookbook repository contains absolute URLs for download (i.e.<br>
> these point<br>
> to published artifacts in a distribution).<br>
> The current publication/distribution concept seems to have the<br>
> underlying<br>
> assumption that a Publication is fully relocatable:<br>
> PublishedMetadata<br>
> artifacts are created by the publishing task and creating a<br>
> Distribution is a<br>
> synchronous call that determines the base path of the published<br>
> artifacts.<br>
> This causes a problem with said "universe" file. Ideally, it could<br>
> be<br>
> pre-computed (it lists all artifacts in the repo). However, this<br>
> can't be<br>
> done AFAIK since the base path is unknown at publication time and<br>
> one can't<br>
> generate additional metadata artifacts for a specific distribution<br>
> later.<br>
> The best solution I came up with was to implement a dynamic API.<br>
> To reduce the<br>
> amount of work to be done, the API does a simple string<br>
> replacement: During<br>
> publication, the universe file is pre-computed using placeholders.<br>
> In the<br>
> dynamic API these placeholders are replaced with the actual base<br>
> URL of the<br>
> distribution.<br>
> However, I would prefer not to be forced to implement a dynamic<br>
> API for static<br>
> information. Is there a way to solve this differently?<br>
> <br>
> Are relative paths an option? If not, I don’t think there’s currently<br>
> an alternative to a live api. I probably wouldn’t even use a published<br>
> metadata file TBH. I wonder though if there’s maybe functionality we<br>
> could add to do this.<br>
<br>
</div></div>Unfortunately, relative paths are not an option. Berkshelf just takes<br>
the URI as is and requests the content.<br>
<br>
Regarding the published metadata file: I am not sure what you are<br>
suggesting. Should the dynamic API be really dynamic? If someone is<br>
going to mirror the entire Chef Supermarket, this will result in a<br>
"universe" that contains around 22,000 cookbooks. Every time<br>
Berkshelf is called with "install" or "update" this resource will be<br>
requested. I don't think that it makes sense to generate this data<br>
dynamically (it basically corresponds to the "primary.xml" file in<br>
the rpm repo case).<br>
<br>
Or are you suggesting to store the pre-computed result differently?<br>
(I don't like the metadata file solution either. OTOH, this object<br>
has exactly the required life cycle and the publication logic<br>
takes care of it. And string.replace() should be pretty fast.)<br>
<br>
For this particular plugin, it would be nice to be able to associate<br>
metadata files with the distribution, not the publication. But that's<br>
probably a little bit too much to hope for...<br></blockquote><div><br></div><div>I was suggesting a completely live api but I see your point about having a lot of cookbooks and calling this endpoint repeatedly. I think the solution you have works. I can’t think of how to better support your use case ATM.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<span> <br>
> - Content-Type Header: The "universe" file is JSON and must have a<br>
> corresponding<br>
> "Content-Type" HTTP header.<br>
> However, content type of the development server seems to be<br>
> "text/html" by<br>
> default for all artifacts. Apparently, I can't set the<br>
> content-type of a<br>
> (meta-data) artifact(?)<br>
> <br>
> I think this goes back to not using a published metadata file to serve<br>
> up your api. However, I could see how it might be useful.<br>
<br>
</span>Sure, in my case it is no problem, since I set the content type<br>
in the dynamic API. The question is more generic, as content types<br>
should be correct for both artifacts and meta data files in general. <br>
<br>
In Pulp 2 it is determined based on the mime type associated with the<br>
file path (in the ContentView). How is this going to work in Pulp 3?<br></blockquote><div><br></div><div>Maybe we can read the mime type of the artifact before we serve it?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<span> <br>
> - Getting the base url of a distribution in the dynamic API is<br>
> surprisingly<br>
> complicated and depends on the inner structure of pulp core (I<br>
> took the<br>
> implementation from 'pulp_ansible'). IMHO, a well defined way to<br>
> obtain it<br>
> should be part of the plugin API.<br>
> <br>
</span>> I agree. Opened: [2]<a href="https://pulp.plan.io/issues/3677" rel="noreferrer" target="_blank">https://pulp.plan.io/issues<wbr>/3677</a><br>
<span>> <br>
> - "Content" class: The way to use only a single artifact in Content<br>
> (like done<br>
> in pulp_file) seems to require in-depth knowledge of the<br>
> Content/ContentSerializer class and its inner workings.<br>
> The downside of this can already be experienced in the "pulp_file"<br>
> plugin: The<br>
> fields "id" and "created" are missing, since the implementation<br>
> there just<br>
> overrides the 'fields' in the serializer).<br>
> I think two Content types should be part of the plugin API: one<br>
> with<br>
> multiple artifacts, and a simpler one with a single artifact<br>
> <br>
> I began working on this:<br>
</span>> [3]<a href="https://github.com/pulp/pulp/pull/3476" rel="noreferrer" target="_blank">https://github.com/pulp/pul<wbr>p/pull/3476</a><br>
<span>> But there was opposition around having the Content model be responsible<br>
> for generating the relative path on the Content Artifact. I’ve opened<br>
> an issue to see if there’s another way to do this (e.g. using<br>
> serializers):<br>
</span>> [4]<a href="https://pulp.plan.io/issues/3678" rel="noreferrer" target="_blank">https://pulp.plan.io/issues<wbr>/3678</a><br>
<br>
Makes sense. I will follow this.<br>
<span><br>
> - Uploading an Artifact that already exists returns an error, which<br>
> is<br>
> annoying if you use http/curl to import artifacts. Suppose some<br>
> other user<br>
> uploaded an artifact in the past. You won't get useful<br>
> information from the POST request uploading the same artifact:<br>
> HTTP/1.1 400 Bad Request<br>
> Allow: GET, POST, HEAD, OPTIONS<br>
> Content-Type: application/json<br>
> Date: Sat, 12 May 2018 17:50:54 GMT<br>
> Server: WSGIServer/0.2 CPython/3.6.2<br>
> Vary: Accept<br>
> {<br>
> "non_field_errors": [<br>
> "sha512 checksum must be unique."<br>
> ]<br>
> }<br>
> This forced me to do something like:<br>
> ...<br>
> sha256=$(sha256sum "$targz" | awk '{print $1}')<br>
> ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s<wbr>ha256=$sha256<br>
> | jq -r '.results[0]._href')<br>
> if [[ $ARTIFACT_HREF == "null" ]]; then<br>
> echo uploading artifact $cookbook_name sha256: $sha256<br>
</span>> http --form POST [5]<a href="http://localhost:8000/pulp/api" rel="noreferrer" target="_blank">http://localhost:8000/pulp/<wbr>api</a><br>
> /v3/artifacts/ file@$targz<br>
> ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s<br>
<span>> ha256=$sha256 | jq -r '.results[0]._href')<br>
> ...<br>
> Perhaps a "303 See Other" to the existing artifact would help<br>
> here.<br>
> <br>
> Why not just do something like:<br>
</span>> http --form POST [6]<a href="http://localhost:8000/pulp/api/v3/artifacts/" rel="noreferrer" target="_blank">http://localhost:8000/pulp/<wbr>api/v3/artifacts/</a> file@$<br>
<span>> targz || true<br>
> ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s<wbr>ha256=$sha256 | jq<br>
> -r '.results[0]._href’)<br>
> if [[ $ARTIFACT_HREF == “null” ]]; then exit 1; fi<br>
> The error message could be more helpful though. It should probably<br>
> contain the existing artifact’s href. I looked at 303 and am a bit<br>
> ambivalent toward using it.<br>
<br>
</span>Sure, 303 is not really clear-cut. Including the href of the existing<br>
artifact should already help.<br>
<br>
</blockquote></div><div class="gmail_extra"><br></div>Opened <a href="https://pulp.plan.io/issues/3681" target="_blank">https://pulp.plan.io/<wbr>issues/3681</a></div></div>