[Pulp-dev] Deferred downloading (Lazy) catalog management in pulp3

Michael Hrivnak mhrivnak at redhat.com
Tue Mar 14 18:12:03 UTC 2017

This sounds great to me.

The one downside you cite is still an upside in my mind. There's been
interest for some time in having the ability to do things such as:
- delete files on disk and convert to on-demand
- scan files on disk for corrupt ones and re-download them if found

Both of those cases would be well-supported by creating catalog entries for
every file regardless of a repo's current download policy.

As you brought up, we will need to think about how large that table will
become, but that seems like it should be manageable. I think the table
would be on the same order of size (in terms of row count) as the
association table, with the added dimension that multi-file units would
have an entry for each file.


On Tue, Mar 14, 2017 at 1:09 PM, Jeff Ortel <jortel at redhat.com> wrote:

> We have learned a lot about deferred (lazy) download catalog management in
> pulp2.  Currently, this is only
> supported by the RPM plugin importers.  The importer flow made adding
> support cumbersome. The importer(s) will
> generate catalog entries *only* when the download-policy is not
> "immediate" as follows:
> - For each unit not in the repository that it would have downloaded.
> - For each unit already associated.
> The reason for #2 is to support cases where the download policy has
> changed from "immediate" to one of the
> deferred policies.  After switching policies, the user has to do a sync to
> generate catalog entries.  The
> downside is the importer regenerates all of the entries when only a few
> are necessary.
> In pulp3, deferred download catalog management is provided by the proposed
> ChangeSet.  With this change in
> tooling, I want to propose a change in the strategy for managing catalog
> entries.  I propose the ChangeSet
> add/remove entries for content regardless of download-policy.
> This has the following advantages:
> - Changing download policies would not require a sync to generate catalog
> entries.
> - Better supports a use case we are starting to hear about.  "As a user, I
> want to reclaim disk space by
> switch to deferred download policy and then delete stored content."
> Deleting stored content would be an
> entirely separate story.
> - The catalog could be managed by deltas but always be complete.  The
> overhead of managing the catalog is
> proportional to the number of units being added/removed from the
> repository.  That is, on initial sync, all
> the catalog entries are added.  Subsequent syncs will only add/remove
> entries for content being added/removed
> to the repository.
> - The downside is that users not using deferred downloading would incur
> the overhead of managing the catalog
> but we probably need benchmarks using postgres to evaluate this.
> Thoughts?
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170314/2eea6c98/attachment.htm>

More information about the Pulp-dev mailing list