[Pulp-dev] Importers/Exporters

David Davis daviddavis at redhat.com
Fri Feb 21 13:42:20 UTC 2020


I also want to make it easy for the user. I don't think we should support
repositories now though. We're strapped for time, Katello doesn't need it
at the moment, and I think we can add it later.

Another argument for breaking up parameters is that we need to support
incremental exports. I think the repository_versions parameter will either
need to be a mapping of repo versions to base repo versions or we'll need
to have a separate base repo versions parameter that Pulp can check when
exporting a repo version.

David


On Fri, Feb 21, 2020 at 8:30 AM Dennis Kliban <dkliban at redhat.com> wrote:

> We can't provide any data from the Katello database, but we can provide
> enough data for the archive to contain all the published metadata and
> distributions needed to not require the user any extra steps to make the
> content available after import.
>
> We could definitely limit which resources are allowed to be specified for
> this API. The user would never have to specify pulp_href for an artifact.
> Content would only be exported using repositories, repository versions, or
> publications. If a user chooses to export a repository, all the repository
> versions for that repository would be exported along with the content and
> artifacts that belong to those repo versions. When individual repository
> versions are specified, only those repository versions are exported.
> Publications would work the same way.
>
> My main goal is to make the import process as simple as possible for the
> user.
>
> On Fri, Feb 21, 2020 at 7:53 AM David Davis <daviddavis at redhat.com> wrote:
>
>> A couple comments. I'm not sure how pulp will be able to export all the
>> extra metadata that comes from Katello as some of it relates to content
>> views. Also, I'm hesitant to have the user export a generic list of pulp
>> hrefs. I think this could be confusing (do users have to supply artifact
>> hrefs to get artifacts?). I'd rather have a list of params users can
>> specifically export (eg repository_versions, publications, etc). I think
>> Pulp will have decide what you get when you for example export a repository
>> version (likely Content, Artifacts, ContentArtifacts).
>>
>> David
>>
>>
>> On Thu, Feb 20, 2020 at 7:13 PM Dennis Kliban <dkliban at redhat.com> wrote:
>>
>>> Thanks for all the details. I would like to provide Pulp 3 users with a
>>> similar feature. In order to do that, the archive produced by Pulp will
>>> need to include all that extra metadata that comes from Katello right now.
>>> Pulp should support 2 use cases:
>>>
>>>   - As a user, I can generate an archive by specifying a list of
>>> pulp_hrefs.
>>>   - As a user, I can import an archive that was generated on another
>>> pulp.
>>>
>>> The archive would contain database migrations needed to restore all the
>>> resources. It would also have all the files needed to back the artifacts.
>>>
>>> Users could then provide a list of repository versions, publications,
>>> and distributions when creating an artchive. Once the archive is imported,
>>> Pulp is serving the content without having to republish.
>>>
>>> On Thu, Feb 20, 2020 at 9:53 AM Justin Sherrill <jsherril at redhat.com>
>>> wrote:
>>>
>>>> There are two different forms of export today in katello:
>>>>
>>>> Legacy version:
>>>>
>>>>   * Uses pulp2's export functionality
>>>>
>>>>   * Takes the tarball as is
>>>>
>>>> "New" Version
>>>>
>>>>    * Just copies published repository as is (following symlinks)
>>>>
>>>>    * Adds own 'katello' metadata to existing tarball
>>>>
>>>>
>>>> I would imagine that with pulp3 we would somewhat combine these two
>>>> approaches and take the pulp3 generated export file and add in a metadata
>>>> file of some sort.
>>>>
>>>> Justin
>>>> On 2/19/20 2:28 PM, Dennis Kliban wrote:
>>>>
>>>> Thank you for the details. More questions inline.
>>>>
>>>> On Wed, Feb 19, 2020 at 2:04 PM Justin Sherrill <jsherril at redhat.com>
>>>> wrote:
>>>>
>>>>> the goal from our side is to have a very similar experience to the
>>>>> user.  Today the user would:
>>>>>
>>>>> * run a command (for example, something similar to hammer content-view
>>>>> version export --content-view-name=foobar --version=1.0)
>>>>>
>>>>> * this creates a tarball on disk
>>>>>
>>>> What all is in the tarball? Is this just a repository export created by
>>>> Pulp or is there extra information from the Katello db?
>>>>
>>>>> * they copy the tarball to external media
>>>>>
>>>>> * they move the external media to the disconnected katello
>>>>>
>>>>> * they run 'hammer content-view version import
>>>>> --export-tar=/path/to/tarball
>>>>>
>>>> Does katello untar this archive, create a repository in pulp, sync from
>>>> the directory containing the unarchive, and then publish?
>>>>
>>>>> I don't see this changing much for the user, anything additional that
>>>>> needs to be done in pulp can be done behind the cli/api in katello.  Thanks!
>>>>>
>>>>
>>>>
>>>>
>>>>> Justin
>>>>> On 2/19/20 12:52 PM, Dennis Kliban wrote:
>>>>>
>>>>> In Katello that uses Pulp 2, what steps does the user need to take
>>>>> when importing an export into an air gapped environment? I am concerned
>>>>> about making the process more complicated than what the user is already
>>>>> used to.
>>>>>
>>>>> On Wed, Feb 19, 2020 at 11:20 AM David Davis <daviddavis at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the responses so far. I think we could export publications
>>>>>> along with the repo version by exporting any publication that points to a
>>>>>> repo version.
>>>>>>
>>>>>> My concern with exporting repositories is that users will probably
>>>>>> get a bunch of content they don't care about if they want to export a
>>>>>> single repo version. That said, if users do want to export entire repos, we
>>>>>> could add this feature later I think?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 19, 2020 at 10:30 AM Justin Sherrill <jsherril at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 2/14/20 1:09 PM, David Davis wrote:
>>>>>>>
>>>>>>> Grant and I met today to discuss importers and exporters[0] and we'd
>>>>>>> like some feedback before we proceed with the design. To sum up this
>>>>>>> feature briefly: users can export a repository version from one Pulp
>>>>>>> instance and import it to another.
>>>>>>>
>>>>>>> # Master/Detail vs Core
>>>>>>>
>>>>>>> So one fundamental question is whether we should use a Master/Detail
>>>>>>> approach or just have core control the flow but call out to plugins to get
>>>>>>> export formats.
>>>>>>>
>>>>>>> To give some background: we currently define Exporters (ie
>>>>>>> FileSystemExporter) in core as Master models. Plugins extend this model
>>>>>>> which allows them to configure or customize the Exporter. This was
>>>>>>> necessary because some plugins need to export Publications (along with
>>>>>>> repository metadata) while other plugins who don't have Publications or
>>>>>>> metadata export RepositoryVersions.
>>>>>>>
>>>>>>> The other option is to have core handle the workflow. The user would
>>>>>>> call a core endpoint and provide a RepositoryVersion. This would work
>>>>>>> because for importing/exporting, you wouldn't ever use Publications because
>>>>>>> metadata won't be used for importing back into Pulp. If needed, core could
>>>>>>> provide a way for plugin writers to write custom handlers/exporters for
>>>>>>> content types.
>>>>>>>
>>>>>>> If we go with the second option, the question then becomes whether
>>>>>>> we should divorce the concept of Exporters and import/export. Or do we also
>>>>>>> switch Exporters from Master/Detail to core only?
>>>>>>>
>>>>>>> # Foreign Keys
>>>>>>>
>>>>>>> Content can be distributed across multiple tables (eg UpdateRecord
>>>>>>> has UpdateCollection, etc). In our export, we could either use primary keys
>>>>>>> (UUIDs) or natural keys to relate records. The former assumes that UUIDs
>>>>>>> are unique across Pulp instances. The safer but more complex alternative is
>>>>>>> to use natural keys. This would involve storing a set of fields on a record
>>>>>>> that would be used to identify a related record.
>>>>>>>
>>>>>>> # Incremental Exports
>>>>>>>
>>>>>>> There are two big pieces of data contained in an export: the dataset
>>>>>>> of Content from the database and the artifact files. An incremental export
>>>>>>> cuts down on the size of an export by only exporting the differences.
>>>>>>> However, when performing an incremental export, we could still export the
>>>>>>> complete dataset instead of just a set of differences
>>>>>>> (additions/removals/updates). This approach would be simpler and it would
>>>>>>> allow us to ensure that the new repo version matches the exported repo
>>>>>>> version exactly. It would however increase the export size but not by much
>>>>>>> I think--probably some number of megabytes at most.
>>>>>>>
>>>>>>> If its simper, i would go with that.  Saving even ~100-200 MB isn't
>>>>>>> that big of a deal IMO.  the biggest savings is in the RPM content.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [0] https://pulp.plan.io/issues/6134
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200221/1df012ce/attachment.htm>


More information about the Pulp-dev mailing list