[Pulp-dev] pulp3: Publishing Proposal

Wed Jun 28 20:05:26 UTC 2017

On 06/28/2017 02:27 PM, Jeff Ortel wrote:
> I have been doing some thinking about pulp3 publishing with the following goals in mind:
> 
> - Eliminate symlinks.
> - Eliminate need for each plugin to have its own Apache conf.
> - Prevent orphaned content that is still published from being deleted.
> 
> The main concept is to store the relationship between an artifact and a URL in the DB instead of using the
> filesystem.  A `Publication` is created (and owned) by a publisher.  Each `Publication` is composed of (linked
> to) many `artifacts`.  The linkage contains the path component of the URL which is used to locate the artifact
> referenced by a URL.
> 
> This covers artifacts as we know them today.  But what about files generated during publishing.  A.K.A.
> metadata?  I propose that these files be stored as artifacts as well.  This requires an `Artifact` to be
> redefined slightly.  The definition would read more like:
> 
>    "A file associated with either stored or published content".
> 
> Or, it would be even more generic, like:
> 
>    "A file contained within the pulp inventory that may be associated with a content (unit) or publication."
> 
> In any case, the relationship to a content (unit) becomes optional.
> 
> Publications are not user facing.  I think we can keep this as an internal core concept.  At least for the MVP.
> 
> The /var/lib/pulp/published directory goes away.
> 
> General Flows:
> 
> Publishing: "The publisher will compose a publication"
> 
> 1. Publisher creates a publication using the plugin API.
> 2. Publisher adds content artifacts to the publication.
> 3. Publisher generates some metadata files in the working dir.
> 4. Publisher adds the metadata files to the publication using the plugin API.  The artifacts can likely be
> created behind the scenes by the plugin API.
> 5. Publisher commits (publishes) the publication.  The plugin API ensures this is atomic.
> 
> Client makes a GET request for content (or metadata):
> 
> 1. Request is routed to the content (WSGI) application (just like in pulp2 for RPM).
> 2. Query the `LinkedArtifact` table by URL path component to get the artifact.
> 3. forward the artifact storage path to:
>     <not stored locally>
>         streamer
>     <stored locally>
>         x-send
> 4. Done.

How would this scale? Assume 10k machines are doing a yum update? How 
would you handle the thundering heard issue?

Have you checked out how koji handles packages? The use files on disk, 
but all the package metadata are urls back to the a single location on 
disk. This may be too rpm specific however. See 
http://koji.katello.org/kojifiles/repos/foreman-nightly-fedora24-build/latest/x86_64/.

-- bk