[Pulp-dev] bulk_create for Artifact, Content, ContentArtifact, and RemoteArtifact?

Brian Bouterse bbouters at redhat.com
Mon Jul 2 20:41:02 UTC 2018

As described in 3770, pulp_file syncs 2.4x slower than than pulp2 [0]. I
believe we want Pulp3 to sync at least as fast as Pulp2. I think we should
consider making the goal of "have pulp3 sync as fast as pulp2" a Pulp3 GA
requirement. The reasoning for me is two fold. (a) users aren't going to
switch to something over twice as slow. (b) we likely will have to make
some non-trivial database changes so doing them now.

How do you feel about this goal/need?

In terms of tackling the problems themselves, I've separated the
performance issue into 3 different performance problems:


Any feedback or discussion on these is welcome. I plan to help organizing
ideas as we explore possible solutions. Once some more info is available
and a few vetted ideas are available, I plan to bring it back to the list.
If anyone wants to talk through them before then, feel free to reach out to

[0]: https://pulp.plan.io/issues/3770#note-5


On Thu, Jun 21, 2018 at 4:50 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> I just tried an implementation of DeclarativeVersion that uses bulk_create
> for all content units, content artifacts, and remote artifacts.
> The content units are incompatible with bulk_save(). When trying to save a
> batch of content units with bulk_save it raises:  ValueError: Can't bulk
> create a multi-table inherited model
> On Thu, Jun 21, 2018 at 4:19 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>> I'm only considering these changes for the plugin writer API to help
>> resolve the performance issues.
>> On Thu, Jun 21, 2018 at 4:11 PM, Austin Macdonald <amacdona at redhat.com>
>> wrote:
>>> For models, bulk_create seems good to me. Endpoints to kick off tasks
>>> like sync that use bulk_create seems fine.
>>> Are you also proposing we have bulk_create for non-task REST API calls?
>>> Should a user be able to POST a list of dictionaries that becomes a set of
>>> Content? I'm open to it, but it seems like it could get ugly.
>>> On Thu, Jun 21, 2018 at 3:54 PM, Brian Bouterse <bbouters at redhat.com>
>>> wrote:
>>>> I've run cprofile on some of the sync code for Pulp3 and I've noticed
>>>> that we may have some problems with bulk_create on some of the object types.
>>>> Here is a small analysis I did: https://pulp.plan.io/issues/3770#note-2
>>>> As an aside, we don't have a bulk add option for
>>>> RepositoryVersion.add_content, which ensures each round trip to the db will
>>>> be for one unit. When you're processing 70K units, that's a lot of trips. I
>>>> don't think we have to add this right now, but to resolve an issue like
>>>> 3770 we may need to.
>>>> I do think we should make our models compatible with bulk_create now
>>>> either way.
>>>> What do you think?
>>>> -Brian
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180702/404112d2/attachment.htm>

More information about the Pulp-dev mailing list