[Pulp-dev] Handling RPM with long filelist in Pulp 2

Tatiana Tereshchenko ttereshc at redhat.com
Tue May 9 22:03:29 UTC 2017

Currently Pulp is able to import RPM with filelist up to ~14-15 MB which
probably cover most repositories but not all of them.

Historically, for each RPM unit several potentially large data snippets are
stored in db:
 - XML snippets for RPM metadata
 - parsed filelist
 - parsed changelog

XML snippets are compressed and so they require much less space than a huge
parsed filelist or a changelog.
Here is the issue [0] to track the effort of eliminating this limitation or
at least increasing the size of filelist that Pulp can handle for each RPM.

The question is what is the best way to handle the issue, keeping in mind
that any substantial change or re-design introduces more risks and efforts
to Pulp 2 line and at the same time this won't be an issue in Pulp 3.

So far the options are:
 1. Eliminate issue completely (e.g. by using GridFS)
 2. Increase current limit for filelist by removing parsed version of it
from db
 3. Do not solve it in Pulp2, wait for Pulp3 which won't have this issue at
 4. Any other idea/option

As an additional info:
 - some thoughts and options [1]  which were considered several months ago
 - by removing parsed filelist (and changelog?) from db we will give a room
for a really large RPM metadata. Pulp will be able to import any RPM with
uncompressed metadata up to ~200MB (~14-15MB currently). Just for
comparison, this is ~1.5 times bigger than the filelists.xml and other.xml
together of the whole EPEL7 repo.
 - removing data from db ^ will affect at least search endpoints like this
[2] where all the data for unit is returned in response.

[0] https://pulp.plan.io/issues/2747
[1] https://etherpad.net/p/mongodb_DocumentTooLarge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170510/55dc3ed1/attachment.htm>

More information about the Pulp-dev mailing list