[Pulp-dev] bulk_create for Artifact, Content, ContentArtifact, and RemoteArtifact?

Brian Bouterse bbouters at redhat.com
Wed Jul 11 14:26:42 UTC 2018


Thanks to @dalley and @daviddavis for their help investigating the
different performance issues related to the epic #3770. Here's an update on
changes made and upcoming.

https://pulp.plan.io/issues/3812 - 50x speedup on saving Content units when
not using multi-table inheritance. I believe this means we need to have
Content units not use Master/Detail. I believe plugin writers are waiting
on this change, so we should prioritize this for Pulp3 and add it to a
sprint (I think).

https://pulp.plan.io/issues/3813 - Resolved via documentation on how to use
bulk_save safely with Artifacts. Needs to be added to sprint. 15x - 20x
speedup experimentally shown.

https://pulp.plan.io/issues/3814 - The interface for add_content and
remove_content now only take a Queryset. 10x speedup at least. Already
merged, coming in next Pulp3 beta. This also sped up the API calls that use
these interfaces also. :)

We may have a lingering performance issue of moving a large number of files
into place across filesystems, but we'll wait until we have a clear
reproducer that we can optimize on to handle that case. We also believe we
know how to resolve that file-saving issue should it arise.

On Mon, Jul 2, 2018 at 4:41 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> As described in 3770, pulp_file syncs 2.4x slower than than pulp2 [0]. I
> believe we want Pulp3 to sync at least as fast as Pulp2. I think we should
> consider making the goal of "have pulp3 sync as fast as pulp2" a Pulp3 GA
> requirement. The reasoning for me is two fold. (a) users aren't going to
> switch to something over twice as slow. (b) we likely will have to make
> some non-trivial database changes so doing them now.
>
> How do you feel about this goal/need?
>
> In terms of tackling the problems themselves, I've separated the
> performance issue into 3 different performance problems:
>
> https://pulp.plan.io/issues/3812
> https://pulp.plan.io/issues/3813
> https://pulp.plan.io/issues/3814
>
> Any feedback or discussion on these is welcome. I plan to help organizing
> ideas as we explore possible solutions. Once some more info is available
> and a few vetted ideas are available, I plan to bring it back to the list.
> If anyone wants to talk through them before then, feel free to reach out to
> me.
>
> [0]: https://pulp.plan.io/issues/3770#note-5
>
> -Brian
>
>
> On Thu, Jun 21, 2018 at 4:50 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> I just tried an implementation of DeclarativeVersion that uses
>> bulk_create for all content units, content artifacts, and remote artifacts.
>>
>> The content units are incompatible with bulk_save(). When trying to save
>> a batch of content units with bulk_save it raises:  ValueError: Can't bulk
>> create a multi-table inherited model
>>
>> On Thu, Jun 21, 2018 at 4:19 PM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> I'm only considering these changes for the plugin writer API to help
>>> resolve the performance issues.
>>>
>>> On Thu, Jun 21, 2018 at 4:11 PM, Austin Macdonald <amacdona at redhat.com>
>>> wrote:
>>>
>>>> For models, bulk_create seems good to me. Endpoints to kick off tasks
>>>> like sync that use bulk_create seems fine.
>>>>
>>>> Are you also proposing we have bulk_create for non-task REST API calls?
>>>> Should a user be able to POST a list of dictionaries that becomes a set of
>>>> Content? I'm open to it, but it seems like it could get ugly.
>>>>
>>>> On Thu, Jun 21, 2018 at 3:54 PM, Brian Bouterse <bbouters at redhat.com>
>>>> wrote:
>>>>
>>>>> I've run cprofile on some of the sync code for Pulp3 and I've noticed
>>>>> that we may have some problems with bulk_create on some of the object types.
>>>>>
>>>>> Here is a small analysis I did: https://pulp.plan.io/issues/37
>>>>> 70#note-2
>>>>>
>>>>> As an aside, we don't have a bulk add option for
>>>>> RepositoryVersion.add_content, which ensures each round trip to the db will
>>>>> be for one unit. When you're processing 70K units, that's a lot of trips. I
>>>>> don't think we have to add this right now, but to resolve an issue like
>>>>> 3770 we may need to.
>>>>>
>>>>> I do think we should make our models compatible with bulk_create now
>>>>> either way.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> -Brian
>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180711/83d98d49/attachment.htm>


More information about the Pulp-dev mailing list