[Pulp-dev] Importers/Exporters

Dennis Kliban dkliban at redhat.com
Fri Feb 21 13:30:33 UTC 2020


We can't provide any data from the Katello database, but we can provide
enough data for the archive to contain all the published metadata and
distributions needed to not require the user any extra steps to make the
content available after import.

We could definitely limit which resources are allowed to be specified for
this API. The user would never have to specify pulp_href for an artifact.
Content would only be exported using repositories, repository versions, or
publications. If a user chooses to export a repository, all the repository
versions for that repository would be exported along with the content and
artifacts that belong to those repo versions. When individual repository
versions are specified, only those repository versions are exported.
Publications would work the same way.

My main goal is to make the import process as simple as possible for the
user.

On Fri, Feb 21, 2020 at 7:53 AM David Davis <daviddavis at redhat.com> wrote:

> A couple comments. I'm not sure how pulp will be able to export all the
> extra metadata that comes from Katello as some of it relates to content
> views. Also, I'm hesitant to have the user export a generic list of pulp
> hrefs. I think this could be confusing (do users have to supply artifact
> hrefs to get artifacts?). I'd rather have a list of params users can
> specifically export (eg repository_versions, publications, etc). I think
> Pulp will have decide what you get when you for example export a repository
> version (likely Content, Artifacts, ContentArtifacts).
>
> David
>
>
> On Thu, Feb 20, 2020 at 7:13 PM Dennis Kliban <dkliban at redhat.com> wrote:
>
>> Thanks for all the details. I would like to provide Pulp 3 users with a
>> similar feature. In order to do that, the archive produced by Pulp will
>> need to include all that extra metadata that comes from Katello right now.
>> Pulp should support 2 use cases:
>>
>>   - As a user, I can generate an archive by specifying a list of
>> pulp_hrefs.
>>   - As a user, I can import an archive that was generated on another pulp.
>>
>> The archive would contain database migrations needed to restore all the
>> resources. It would also have all the files needed to back the artifacts.
>>
>> Users could then provide a list of repository versions, publications, and
>> distributions when creating an artchive. Once the archive is imported, Pulp
>> is serving the content without having to republish.
>>
>> On Thu, Feb 20, 2020 at 9:53 AM Justin Sherrill <jsherril at redhat.com>
>> wrote:
>>
>>> There are two different forms of export today in katello:
>>>
>>> Legacy version:
>>>
>>>   * Uses pulp2's export functionality
>>>
>>>   * Takes the tarball as is
>>>
>>> "New" Version
>>>
>>>    * Just copies published repository as is (following symlinks)
>>>
>>>    * Adds own 'katello' metadata to existing tarball
>>>
>>>
>>> I would imagine that with pulp3 we would somewhat combine these two
>>> approaches and take the pulp3 generated export file and add in a metadata
>>> file of some sort.
>>>
>>> Justin
>>> On 2/19/20 2:28 PM, Dennis Kliban wrote:
>>>
>>> Thank you for the details. More questions inline.
>>>
>>> On Wed, Feb 19, 2020 at 2:04 PM Justin Sherrill <jsherril at redhat.com>
>>> wrote:
>>>
>>>> the goal from our side is to have a very similar experience to the
>>>> user.  Today the user would:
>>>>
>>>> * run a command (for example, something similar to hammer content-view
>>>> version export --content-view-name=foobar --version=1.0)
>>>>
>>>> * this creates a tarball on disk
>>>>
>>> What all is in the tarball? Is this just a repository export created by
>>> Pulp or is there extra information from the Katello db?
>>>
>>>> * they copy the tarball to external media
>>>>
>>>> * they move the external media to the disconnected katello
>>>>
>>>> * they run 'hammer content-view version import
>>>> --export-tar=/path/to/tarball
>>>>
>>> Does katello untar this archive, create a repository in pulp, sync from
>>> the directory containing the unarchive, and then publish?
>>>
>>>> I don't see this changing much for the user, anything additional that
>>>> needs to be done in pulp can be done behind the cli/api in katello.  Thanks!
>>>>
>>>
>>>
>>>
>>>> Justin
>>>> On 2/19/20 12:52 PM, Dennis Kliban wrote:
>>>>
>>>> In Katello that uses Pulp 2, what steps does the user need to take when
>>>> importing an export into an air gapped environment? I am concerned about
>>>> making the process more complicated than what the user is already used to.
>>>>
>>>> On Wed, Feb 19, 2020 at 11:20 AM David Davis <daviddavis at redhat.com>
>>>> wrote:
>>>>
>>>>> Thanks for the responses so far. I think we could export publications
>>>>> along with the repo version by exporting any publication that points to a
>>>>> repo version.
>>>>>
>>>>> My concern with exporting repositories is that users will probably get
>>>>> a bunch of content they don't care about if they want to export a single
>>>>> repo version. That said, if users do want to export entire repos, we could
>>>>> add this feature later I think?
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Wed, Feb 19, 2020 at 10:30 AM Justin Sherrill <jsherril at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On 2/14/20 1:09 PM, David Davis wrote:
>>>>>>
>>>>>> Grant and I met today to discuss importers and exporters[0] and we'd
>>>>>> like some feedback before we proceed with the design. To sum up this
>>>>>> feature briefly: users can export a repository version from one Pulp
>>>>>> instance and import it to another.
>>>>>>
>>>>>> # Master/Detail vs Core
>>>>>>
>>>>>> So one fundamental question is whether we should use a Master/Detail
>>>>>> approach or just have core control the flow but call out to plugins to get
>>>>>> export formats.
>>>>>>
>>>>>> To give some background: we currently define Exporters (ie
>>>>>> FileSystemExporter) in core as Master models. Plugins extend this model
>>>>>> which allows them to configure or customize the Exporter. This was
>>>>>> necessary because some plugins need to export Publications (along with
>>>>>> repository metadata) while other plugins who don't have Publications or
>>>>>> metadata export RepositoryVersions.
>>>>>>
>>>>>> The other option is to have core handle the workflow. The user would
>>>>>> call a core endpoint and provide a RepositoryVersion. This would work
>>>>>> because for importing/exporting, you wouldn't ever use Publications because
>>>>>> metadata won't be used for importing back into Pulp. If needed, core could
>>>>>> provide a way for plugin writers to write custom handlers/exporters for
>>>>>> content types.
>>>>>>
>>>>>> If we go with the second option, the question then becomes whether we
>>>>>> should divorce the concept of Exporters and import/export. Or do we also
>>>>>> switch Exporters from Master/Detail to core only?
>>>>>>
>>>>>> # Foreign Keys
>>>>>>
>>>>>> Content can be distributed across multiple tables (eg UpdateRecord
>>>>>> has UpdateCollection, etc). In our export, we could either use primary keys
>>>>>> (UUIDs) or natural keys to relate records. The former assumes that UUIDs
>>>>>> are unique across Pulp instances. The safer but more complex alternative is
>>>>>> to use natural keys. This would involve storing a set of fields on a record
>>>>>> that would be used to identify a related record.
>>>>>>
>>>>>> # Incremental Exports
>>>>>>
>>>>>> There are two big pieces of data contained in an export: the dataset
>>>>>> of Content from the database and the artifact files. An incremental export
>>>>>> cuts down on the size of an export by only exporting the differences.
>>>>>> However, when performing an incremental export, we could still export the
>>>>>> complete dataset instead of just a set of differences
>>>>>> (additions/removals/updates). This approach would be simpler and it would
>>>>>> allow us to ensure that the new repo version matches the exported repo
>>>>>> version exactly. It would however increase the export size but not by much
>>>>>> I think--probably some number of megabytes at most.
>>>>>>
>>>>>> If its simper, i would go with that.  Saving even ~100-200 MB isn't
>>>>>> that big of a deal IMO.  the biggest savings is in the RPM content.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [0] https://pulp.plan.io/issues/6134
>>>>>>
>>>>>> David
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200221/eb7e265a/attachment.htm>


More information about the Pulp-dev mailing list