[Pulp-dev] pulp3: Publishing Proposal

Wed Jun 28 18:27:33 UTC 2017

I have been doing some thinking about pulp3 publishing with the following goals in mind:

- Eliminate symlinks.
- Eliminate need for each plugin to have its own Apache conf.
- Prevent orphaned content that is still published from being deleted.

The main concept is to store the relationship between an artifact and a URL in the DB instead of using the
filesystem.  A `Publication` is created (and owned) by a publisher.  Each `Publication` is composed of (linked
to) many `artifacts`.  The linkage contains the path component of the URL which is used to locate the artifact
referenced by a URL.

This covers artifacts as we know them today.  But what about files generated during publishing.  A.K.A.
metadata?  I propose that these files be stored as artifacts as well.  This requires an `Artifact` to be
redefined slightly.  The definition would read more like:

  "A file associated with either stored or published content".

Or, it would be even more generic, like:

  "A file contained within the pulp inventory that may be associated with a content (unit) or publication."

In any case, the relationship to a content (unit) becomes optional.

Publications are not user facing.  I think we can keep this as an internal core concept.  At least for the MVP.

The /var/lib/pulp/published directory goes away.

General Flows:

Publishing: "The publisher will compose a publication"

1. Publisher creates a publication using the plugin API.
2. Publisher adds content artifacts to the publication.
3. Publisher generates some metadata files in the working dir.
4. Publisher adds the metadata files to the publication using the plugin API.  The artifacts can likely be
created behind the scenes by the plugin API.
5. Publisher commits (publishes) the publication.  The plugin API ensures this is atomic.

Client makes a GET request for content (or metadata):

1. Request is routed to the content (WSGI) application (just like in pulp2 for RPM).
2. Query the `LinkedArtifact` table by URL path component to get the artifact.
3. forward the artifact storage path to:
   <not stored locally>
       streamer
   <stored locally>
       x-send
4. Done.

Tables:
=============================

Publication
  id [PK]
  publisher_id [FK]
  created
  schemes

LinkedArtifact
  id [PK]
  publication_id [FK]
  artifact_id [FK]
  URL

Examples Data:
==============================

Publisher:
----------------
publisher-1, ...

Artifact:
----------------
artifact-1, /var/lib/pulp/artifact/ff/9f373839d0/manifest
artifact-2, /var/lib/pulp/artifact/b1/37b64a8c83/tiger.img

Publication:
----------------
publication-1, publisher-1, 6-1-2017,..

LinkedArtifact:
----------------
<id>, publication-1, artifact-1, /pulp/published/http/zoo/md/manifest
<id>, publication-1, artifact-2, /pulp/published/http/zoo/images/tiger.img

URLs would be: /pulp/published/(http|https)/<path>

I think the core can have a single Apache configuration that defines 2 directories.  One HTTPS protected by
SSL/entitlement and the other is plain HTTP.

Thoughts/Comments?

-jeff

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170628/f7890b88/attachment.sig>