[Pulp-dev] Importers/Exporters

Justin Sherrill jsherril at redhat.com
Thu Feb 20 14:53:00 UTC 2020

There are two different forms of export today in katello:

Legacy version:

   * Uses pulp2's export functionality

   * Takes the tarball as is

"New" Version

    * Just copies published repository as is (following symlinks)

    * Adds own 'katello' metadata to existing tarball

I would imagine that with pulp3 we would somewhat combine these two 
approaches and take the pulp3 generated export file and add in a 
metadata file of some sort.


On 2/19/20 2:28 PM, Dennis Kliban wrote:
> Thank you for the details. More questions inline.
> On Wed, Feb 19, 2020 at 2:04 PM Justin Sherrill <jsherril at redhat.com 
> <mailto:jsherril at redhat.com>> wrote:
>     the goal from our side is to have a very similar experience to the
>     user.  Today the user would:
>     * run a command (for example, something similar to hammer
>     content-view version export --content-view-name=foobar --version=1.0)
>     * this creates a tarball on disk
> What all is in the tarball? Is this just a repository export created 
> by Pulp or is there extra information from the Katello db?
>     * they copy the tarball to external media
>     * they move the external media to the disconnected katello
>     * they run 'hammer content-view version import
>     --export-tar=/path/to/tarball
> Does katello untar this archive, create a repository in pulp, sync 
> from the directory containing the unarchive, and then publish?
>     I don't see this changing much for the user, anything additional
>     that needs to be done in pulp can be done behind the cli/api in
>     katello.  Thanks!
>     Justin
>     On 2/19/20 12:52 PM, Dennis Kliban wrote:
>>     In Katello that uses Pulp 2, what steps does the user need to
>>     take when importing an export into an air gapped environment? I
>>     am concerned about making the process more complicated than what
>>     the user is already used to.
>>     On Wed, Feb 19, 2020 at 11:20 AM David Davis
>>     <daviddavis at redhat.com <mailto:daviddavis at redhat.com>> wrote:
>>         Thanks for the responses so far. I think we could export
>>         publications along with the repo version by exporting any
>>         publication that points to a repo version.
>>         My concern with exporting repositories is that users will
>>         probably get a bunch of content they don't care about if they
>>         want to export a single repo version. That said, if users do
>>         want to export entire repos, we could add this feature later
>>         I think?
>>         David
>>         On Wed, Feb 19, 2020 at 10:30 AM Justin Sherrill
>>         <jsherril at redhat.com <mailto:jsherril at redhat.com>> wrote:
>>             On 2/14/20 1:09 PM, David Davis wrote:
>>>             Grant and I met today to discuss importers and
>>>             exporters[0] and we'd like some feedback before we
>>>             proceed with the design. To sum up this feature briefly:
>>>             users can export a repository version from one Pulp
>>>             instance and import it to another.
>>>             # Master/Detail vs Core
>>>             So one fundamental question is whether we should use a
>>>             Master/Detail approach or just have core control the
>>>             flow but call out to plugins to get export formats.
>>>             To give some background: we currently define Exporters
>>>             (ie FileSystemExporter) in core as Master models.
>>>             Plugins extend this model which allows them to configure
>>>             or customize the Exporter. This was necessary because
>>>             some plugins need to export Publications (along with
>>>             repository metadata) while other plugins who don't have
>>>             Publications or metadata export RepositoryVersions.
>>>             The other option is to have core handle the workflow.
>>>             The user would call a core endpoint and provide a
>>>             RepositoryVersion. This would work because for
>>>             importing/exporting, you wouldn't ever use Publications
>>>             because metadata won't be used for importing back into
>>>             Pulp. If needed, core could provide a way for plugin
>>>             writers to write custom handlers/exporters for content
>>>             types.
>>>             If we go with the second option, the question then
>>>             becomes whether we should divorce the concept of
>>>             Exporters and import/export. Or do we also switch
>>>             Exporters from Master/Detail to core only?
>>>             # Foreign Keys
>>>             Content can be distributed across multiple tables (eg
>>>             UpdateRecord has UpdateCollection, etc). In our export,
>>>             we could either use primary keys (UUIDs) or natural keys
>>>             to relate records. The former assumes that UUIDs are
>>>             unique across Pulp instances. The safer but more complex
>>>             alternative is to use natural keys. This would involve
>>>             storing a set of fields on a record that would be used
>>>             to identify a related record.
>>>             # Incremental Exports
>>>             There are two big pieces of data contained in an export:
>>>             the dataset of Content from the database and the
>>>             artifact files. An incremental export cuts down on the
>>>             size of an export by only exporting the differences.
>>>             However, when performing an incremental export, we could
>>>             still export the complete dataset instead of just a set
>>>             of differences (additions/removals/updates). This
>>>             approach would be simpler and it would allow us to
>>>             ensure that the new repo version matches the exported
>>>             repo version exactly. It would however increase the
>>>             export size but not by much I think--probably some
>>>             number of megabytes at most.
>>             If its simper, i would go with that. Saving even ~100-200
>>             MB isn't that big of a deal IMO.  the biggest savings is
>>             in the RPM content.
>>>             [0] https://pulp.plan.io/issues/6134
>>>             David
>>>             _______________________________________________
>>>             Pulp-dev mailing list
>>>             Pulp-dev at redhat.com  <mailto:Pulp-dev at redhat.com>
>>>             https://www.redhat.com/mailman/listinfo/pulp-dev
>>             _______________________________________________
>>             Pulp-dev mailing list
>>             Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>             https://www.redhat.com/mailman/listinfo/pulp-dev
>>         _______________________________________________
>>         Pulp-dev mailing list
>>         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>         https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200220/6a2c22e7/attachment.htm>

More information about the Pulp-dev mailing list