<div dir="ltr"><div>Irrespective to the MVP proposal, but to address one of Michael's comments: incremental repo creation is not simply for performance reasons. Because of how yum clients work (and I can only attest to this empirically since I have not read the code), yum repositories need to preserve a few generations of the (potentially compressed) xml files referenced in previous repomd.xml files. Otherwise, a yum client with a cached copy of repomd.xml may ask for a primary.xml.gz that got removed by a new publish, and things don't look pretty after a 404 on that. I think not including this possibility in the MVP will result in a *functional* regression.</div><div><br></div><div>But I absolutely love the idea of versioned repositories - see my attempt to address that with the <a href="https://github.com/sassoftware/pulp-snapshot">https://github.com/sassoftware/pulp-snapshot</a> distributor.</div><div><br></div><div>Michael, on your point number 4 - in pulp 2 I was under the impression that the publisher is only responsible with creating a directory representation of a pulp repository (in the case of the yum distributor, it's a directory of a yum repository). Apache is responsible with serving that further, with or without additional authentication. Are you suggesting more than this behavior for pulp 3?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 24, 2017 at 9:30 AM, Michael Hrivnak <span dir="ltr"><<a href="mailto:mhrivnak@redhat.com" target="_blank">mhrivnak@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">For publish, a plugin writer needs the ability to:<div><br></div><div>- iterate through the units being published</div><div>- create new artifacts based on that iteration, or any other method it sees fit</div><div>- make each unit's files available at a specific path either via http or on a file store (for example, docker manifest files need to be served directly by crane)</div><div>- make each newly-created artifact available at a specific path either via http or on a file store (for example, metadata files for crane don't get served via http)</div><div><br></div><div>Optimizations in Pulp 2 further allow a plugin writer to read artifacts created by a previous publication. For example, the rpm plugin uses this to quickly add a few entries to an XML file instead of completely re-creating it. This may not strictly be required for the MVP, but its absence would likely create a substantial performance regression. Similarly, this requires the ability to determine which units have been added and removed since the last publish. See versioned repos below...</div><div><br></div><div>As for making copies of unit files, I think if Pulp did that for each publish, it would become effectively unusable for a lot of users. At best, it would double the required storage, but for many users would be much worse. It would also greatly increase the required time to perform a publish. As such, I think the MVP should continue to store just one copy of each unit, including its files, similar to Pulp 2. How those files are referenced is an area we could definitely improve though. From a plugin writer's perspective, it should be enough to tell the platform "make file X available at location Y", and not worry about whether copies, symlinks, or any other referencing method is being employed.</div><div><br></div><div>As for recording which units are available with a publication... If we implement versioned repositories, then each repo version would be an addressable and immutable object with references to units. A publication would naturally then reference a repo version. How exactly we model the repo versions could go several ways, but they all include a single addressable object as far as I envision it. I promise I'll cook up a specific proposal in the near future. ;)</div><div><br></div><div><br></div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Mon, Apr 24, 2017 at 7:31 AM, Mihai Ibanescu <span dir="ltr"><<a href="mailto:mihai.ibanescu@gmail.com" target="_blank">mihai.ibanescu@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Jeff,<div><br></div><div>A few comments to your strawman:</div><div><br></div><div>* What is an artifact? If it is a database model, then why not call it a unit if that's what it's called everywhere else in the code?</div><div>* How would you deal with metadata-only units that don't have a file representation, but show up in some kind of metadata (e.g. package groups / errata). associate() doesn't seem to give me that.</div><div>* For that matter, how would you deal with files that are not representations of units, but new artifacts? (e.g. repomd.xml and the like). It feels like it can be possible by extending my commit() with writing the metadata and then calling the parent class' commit() (which does the atomic publish), but I think that's not pretty.</div><div><br></div></div><div class="m_8593386670142619094HOEnZb"><div class="m_8593386670142619094h5"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 21, 2017 at 5:09 PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I like this Brian and want to take it one step further. I think there is value in abstracting how a<br> publication is composed. Files like metadata need to be composed by the publisher (as needed) in the<br> working_dir then "added" to the publication. Artifacts could be "associated" to the publication and the<br> platform determines how this happens (symlinks/in the DB).<br> <br> Assuming the Publisher is instantiated with a 'working_dir' attribute.<br> <br> ------------------------------<wbr>---------<br> <br> Something like this to kick around:<br> <br> <br> class Publication:<br> """<br> The Publication provided by the plugin API.<br> <br> Examples:<br> <br> A crude example with lots of hand waving.<br> <br> In Publisher.publish()<br> <br> >>><br> >>> publication = Publication(self.working_dir)<br> >>><br> >>> # Artifacts<br> >>> for artifact in []: # artifacts<br> >>> path = ' <determine relative path>'<br> >>> publication.associate(artifac<wbr>t, path)<br> >>><br> >>> # Metadata created in self.staging_dir <here>.<br> >>><br> >>> publication.add('repodata/prim<wbr>ary.xml')<br> >>> publication.add('repodata/othe<wbr>rs.xml')<br> >>> publication.add('repodata/repo<wbr>md.xml')<br> >>><br> >>> # - OR -<br> >>><br> >>> publication.add('repodata/')<br> >>><br> >>> publication.commit()<br> """<br> <br> def __init__(self, staging_dir):<br> """<br> Args:<br> staging_dir: Absolute path to where publication is staged.<br> """<br> self.staging_dir = staging_dir<br> <br> def associate(self, artifact, path):<br> """<br> Associate an artifact to the publication.<br> This could result in creating a symlink in the staging directory<br> or (later) creating a record in the db.<br> <br> Args:<br> artifact: A content artifact<br> path: Relative path within the staging directory AND eventually<br> within the published URL.<br> """<br> <br> def add(self, path):<br> """<br> Add a file within the staging directory to the publication by relative path.<br> <br> Args:<br> path: Relative path within the staging directory AND eventually within<br> the published URL. When *path* is a directory, all files<br> within the directory are added.<br> """<br> <br> def commit(self):<br> """<br> When committed, the publication is atomically published.<br> """<br> # atomic magic<br> <span><br> <br> <br> <br> <br> On 04/19/2017 10:16 AM, Brian Bouterse wrote:<br> > I was thinking about the design here and I wanted to share some thoughts.<br> ><br> > For the MVP, I think a publisher implemented by a plugin developer would write all files into the working<br> > directory and the platform will "atomically publish" that data into the location configured by the repository.<br> > The "atomic publish" aspect would copy/stage the files in a permanent location but would use a single symlink<br> > to the top level folder to go live with the data. This would make atomic publication the default behavior.<br> > This runs after the publish() implemented by the plugin developer returns, after it has written all of its<br> > data to the working dir.<br> ><br> > Note that ^ allows for the plugin writer to write the actual contents of files in the working directory<br> > instead of symlinks, causing Pulp to duplicate all content on disk with every publish. That would be a<br> > incredibly inefficient way to write a plugin but it's something the platform would not prevent in any explicit<br> > way. I'm not sure if this is something we should improve on or not.<br> ><br> > At a later point, we could add in the incremental publish maybe as a method on a Publisher called<br> > incremental_publish() which would only be called if the previous publish only had units added.<br> ><br> ><br> ><br> </span><span>> On Mon, Apr 17, 2017 at 4:22 PM, Brian Bouterse <<a href="mailto:bbouters@redhat.com" target="_blank">bbouters@redhat.com</a> <mailto:<a href="mailto:bbouters@redhat.com" target="_blank">bbouters@redhat.com</a>>> wrote:<br> ><br> > For plugin writers who are writing a publisher for Pulp3, what do they need to handle during publishing<br> > versus platform? To make a comparison against sync, the "Download API" and "Changesets" [0] allows the<br> > plugin writer to tell platform about a remote piece of content. Then platform handles creating the unit,<br> > fetching it, and saving it. Will there be a similar API to support publishing to ease the burden of a<br> > plugin writer? Also will this allow platform to have a structured knowledge of a publication with Pulp3?<br> ><br> > I wanted to try to characterize the problem statement as two separate questions:<br> ><br> > 1) How will units be recorded to allow platform to know which units comprise a specific publish?<br> > 2) What are plugin writer's needs at publish time, and what repetitive tasks could be moved to platform?<br> ><br> > As a quick recalling of how Pulp2 works. Each publisher would write files into the working directory and<br> > then they would get moved into their permanent home. Also there is the incrementalPublisher base machinery<br> > which allowed for an additive publication which would use the previous publish and was "faster". Finally<br> > in Pulp2, the only record of a publication are the symlinks on the filesystem.<br> ><br> > I have some of my own ideas on these things, but I'll start the conversation.<br> ><br> </span>> [0]: <a href="https://github.com/pulp/pulp/pull/2876" rel="noreferrer" target="_blank">https://github.com/pulp/pulp/p<wbr>ull/2876</a> <<a href="https://github.com/pulp/pulp/pull/2876" rel="noreferrer" target="_blank">https://github.com/pulp/pulp/<wbr>pull/2876</a>><br> ><br> > -Brian<br> ><br> ><br> ><br> ><br> > ______________________________<wbr>_________________<br> > Pulp-dev mailing list<br> > <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br> > <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br> ><br> <br> <br>______________________________<wbr>_________________<br> Pulp-dev mailing list<br> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br> <br></blockquote></div><br></div> </div></div><br>______________________________<wbr>_________________<br> Pulp-dev mailing list<br> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br> <br></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div class="m_8593386670142619094gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><p style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px;margin:0px!important;padding:0px!important"><span style="margin:0px!important;padding:0px!important">Michael</span> <span style="margin:0px!important;padding:0px!important">Hrivnak</span></p><p style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px;margin:0px!important;padding:0px!important"></p><span style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px;margin:0px!important;padding:0px!important"><span style="margin:0px!important;padding:0px!important">Principal Software Engineer</span><span style="margin:0px!important;padding:0px!important">, <span style="margin:0px!important;padding:0px!important">RHCE</span></span> </span><span style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px"></span><br style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px;margin:0px!important;padding:0px!important"><p style="color:rgb(0,0,0);font-family:overpass-mono,monospace;font-size:10px;margin:0px!important;padding:0px!important">Red Hat</p></div></div> </font></span></div> <br>______________________________<wbr>_________________<br> Pulp-dev mailing list<br> <a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/<wbr>mailman/listinfo/pulp-dev</a><br> <br></blockquote></div><br></div>