[Pulp-dev] Publish API for Plugin Writers (Pulp3)

Mon Apr 24 14:51:40 UTC 2017

On 04/24/2017 06:31 AM, Mihai Ibanescu wrote:
> Jeff,
> 
> A few comments to your strawman:
> 
> * What is an artifact? If it is a database model, then why not call it a unit if that's what it's called
> everywhere else in the code?

In pulp3, content units and associated files are separate.  Each content unit has 1/many artifacts.  An RPM
content unit has exactly 1 artifact whereas docker has 3 artifacts.

> * How would you deal with metadata-only units that don't have a file representation, but show up in some kind
> of metadata (e.g. package groups / errata). associate() doesn't seem to give me that.

The straw-man splits published items into 2 categories:

1. Files associated with content which are artifacts.  Handled by associate().
   - RPMs
   - ISOs
   - Docker images
   - etc ...

2. Files generated by the publisher.  Handled by add().
   - Metadata (like everything in repodata/)
   - Metadata only units like errata and package groups.

For category #2 files the publisher uses the information stored in the content unit to generate a file in the
staging directory.  Then uses add() to include it in the publication.

> * For that matter, how would you deal with files that are not representations of units, but new artifacts?
> (e.g. repomd.xml and the like). It feels like it can be possible by extending my commit() with writing the
> metadata and then calling the parent class' commit() (which does the atomic publish), but I think that's not
> pretty.

See above ^^.

> 
> 
> On Fri, Apr 21, 2017 at 5:09 PM, Jeff Ortel <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
> 
>     I like this Brian and want to take it one step further.  I think there is value in abstracting how a
>     publication is composed.  Files like metadata need to be composed by the publisher (as needed) in the
>     working_dir then "added" to the publication.  Artifacts could be "associated" to the publication and the
>     platform determines how this happens (symlinks/in the DB).
> 
>     Assuming the Publisher is instantiated with a 'working_dir' attribute.
> 
>     ---------------------------------------
> 
>     Something like this to kick around:
> 
> 
>     class Publication:
>         """
>         The Publication provided by the plugin API.
> 
>         Examples:
> 
>         A crude example with lots of hand waving.
> 
>         In Publisher.publish()
> 
>         >>>
>         >>> publication = Publication(self.working_dir)
>         >>>
>         >>> # Artifacts
>         >>> for artifact in []: # artifacts
>         >>>     path = ' <determine relative path>'
>         >>>     publication.associate(artifact, path)
>         >>>
>         >>> # Metadata created in self.staging_dir <here>.
>         >>>
>         >>> publication.add('repodata/primary.xml')
>         >>> publication.add('repodata/others.xml')
>         >>> publication.add('repodata/repomd.xml')
>         >>>
>         >>> # - OR -
>         >>>
>         >>> publication.add('repodata/')
>         >>>
>         >>> publication.commit()
>         """
> 
>         def __init__(self, staging_dir):
>             """
>             Args:
>                 staging_dir: Absolute path to where publication is staged.
>             """
>             self.staging_dir = staging_dir
> 
>         def associate(self, artifact, path):
>             """
>             Associate an artifact to the publication.
>             This could result in creating a symlink in the staging directory
>             or (later) creating a record in the db.
> 
>             Args:
>                 artifact: A content artifact
>                 path: Relative path within the staging directory AND eventually
>                       within the published URL.
>             """
> 
>         def add(self, path):
>             """
>             Add a file within the staging directory to the publication by relative path.
> 
>             Args:
>                 path: Relative path within the staging directory AND eventually within
>                       the published URL.  When *path* is a directory, all files
>                       within the directory are added.
>             """
> 
>         def commit(self):
>             """
>             When committed, the publication is atomically published.
>             """
>             # atomic magic
> 
> 
> 
> 
> 
>     On 04/19/2017 10:16 AM, Brian Bouterse wrote:
>     > I was thinking about the design here and I wanted to share some thoughts.
>     >
>     > For the MVP, I think a publisher implemented by a plugin developer would write all files into the working
>     > directory and the platform will "atomically publish" that data into the location configured by the repository.
>     > The "atomic publish" aspect would copy/stage the files in a permanent location but would use a single symlink
>     > to the top level folder to go live with the data. This would make atomic publication the default behavior.
>     > This runs after the publish() implemented by the plugin developer returns, after it has written all of its
>     > data to the working dir.
>     >
>     > Note that ^ allows for the plugin writer to write the actual contents of files in the working directory
>     > instead of symlinks, causing Pulp to duplicate all content on disk with every publish. That would be a
>     > incredibly inefficient way to write a plugin but it's something the platform would not prevent in any explicit
>     > way. I'm not sure if this is something we should improve on or not.
>     >
>     > At a later point, we could add in the incremental publish maybe as a method on a Publisher called
>     > incremental_publish() which would only be called if the previous publish only had units added.
>     >
>     >
>     >
>     > On Mon, Apr 17, 2017 at 4:22 PM, Brian Bouterse <bbouters at redhat.com <mailto:bbouters at redhat.com> <mailto:bbouters at redhat.com
>     <mailto:bbouters at redhat.com>>> wrote:
>     >
>     >     For plugin writers who are writing a publisher for Pulp3, what do they need to handle during publishing
>     >     versus platform? To make a comparison against sync, the "Download API" and "Changesets" [0] allows the
>     >     plugin writer to tell platform about a remote piece of content. Then platform handles creating the unit,
>     >     fetching it, and saving it. Will there be a similar API to support publishing to ease the burden of a
>     >     plugin writer? Also will this allow platform to have a structured knowledge of a publication with Pulp3?
>     >
>     >     I wanted to try to characterize the problem statement as two separate questions:
>     >
>     >     1) How will units be recorded to allow platform to know which units comprise a specific publish?
>     >     2) What are plugin writer's needs at publish time, and what repetitive tasks could be moved to platform?
>     >
>     >     As a quick recalling of how Pulp2 works. Each publisher would write files into the working directory and
>     >     then they would get moved into their permanent home. Also there is the incrementalPublisher base machinery
>     >     which allowed for an additive publication which would use the previous publish and was "faster". Finally
>     >     in Pulp2, the only record of a publication are the symlinks on the filesystem.
>     >
>     >     I have some of my own ideas on these things, but I'll start the conversation.
>     >
>     >     [0]: https://github.com/pulp/pulp/pull/2876 <https://github.com/pulp/pulp/pull/2876>
>     <https://github.com/pulp/pulp/pull/2876 <https://github.com/pulp/pulp/pull/2876>>
>     >
>     >     -Brian
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Pulp-dev mailing list
>     > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     > https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     >
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170424/323c940a/attachment.sig>