[Pulp-dev] Memory consumption on RPM sync

Sat Dec 7 17:26:25 UTC 2019

Some results for new remove_duplicates: when isolating that one change
(including all the other performance improvements) it looks like
remove_duplicates went from 40-70% of the total sync runtime to about 10%.
And I think it was also running (unnecessarily) on the initial sync, which
explains why initial sync time dropped by 70-80% while resync only dropped
by about half.

On Sat, Dec 7, 2019 at 11:29 AM Daniel Alley <dalley at redhat.com> wrote:

> I've been looking into RPM plugin performance a little bit as well due to
> the remove_duplicates issues that Fabricio was able to confirm, and I think
> I've found more low-hanging fruit that is really easy to fix. So, I did.
>
> Attached are some cProfile graphs of a CentOS 7 (
> http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/) sync and resync is
> it is currently. An initial sync of CentOS 7 takes about 6 minutes on my
> machine.  A resync takes about 1 minute 45 seconds.
>
> Observations:
>
>    - Progress Reporting is *really* expensive, because we're incrementing
>    the progress report one-by-one for each content unit (that is, thousands
>    and thousands of extra, separate database updates). It actually takes much
>    longer report the progress as it does to do the parsing itself -- 30% of
>    the total runtime of both syncs and resyncs.  I'm working on a PR to
>    address this by cutting way down on the granularity of the reporting.
>    - gettext inside of plugin/stages/api.py::Stage.put() is taking up 13%
>    of the runtime, because it is doing a "log.debug(_('%(name)s - put:
>    %(content)s'), {'name': self, 'content': item})" for all of the content
>    going into the pipeline.  I don't think it makes sense to do this even if
>    it didn't have a performance cost, so I'm going to remove it.
>
> With these two changes and my remove_duplicates changes, I managed to cut
> the resync time down to 55 seconds and the initial sync time down to 1
> minute 27 seconds.
>
> I expect that remove_duplicates will still be a performance bottleneck
> even after my changes, but I don't have any numbers on that yet.  Remove
> duplicates only triggers when the metadata changes (new content added *and*
> some exists already) or when you manually add content, so I'm going to wait
> for CentOS metadata to change to capture the new results.
>
>
> On Tue, Dec 3, 2019 at 1:36 PM Fabricio Aguiar <fabricio.aguiar at redhat.com>
> wrote:
>
>> After many attempts, today I was able to sync using a vagrant box with
>> 8GB of memory, and it took 9011.04181 seconds(more than 2 hours) to sync
>>
>> I opened 2 WIP PRs:
>> https://github.com/pulp/pulpcore/pull/432
>> https://github.com/pulp/pulp_rpm/pull/1547
>>
>> --- These are very WIP PRs, it was just some tweaks I tried, to see how
>> sync would behave ---
>>
>> Attempts:
>> 1 - Replace SQL OR clauses with IN clauses [a] - slight improvement on
>> memory performance;
>> 2 - Sometimes ProgressReports were in the middle of a loop, doing 50
>> updates queries in a row - slight improvement on memory performance;
>> 3 - UpdateCollection was being saved on each iteration of a loop, changed
>> to bulk creation - good improvement on memory performance;
>> 4 - Introduced batch size on bulk creation - good improvement on memory
>> performance;
>>
>> Guesses:
>> - As I said yesterday when it was around 24000 packages saved, process
>> crashed out of memory,
>> -- I started to look at what could cause it, and when I sorted packages
>> by changelog size, it crashed on less than 3000 packages saved. Changed
>> changelogs to empty list - somewhat improvement on memory performance.
>> - Memory consumption was always increasing when it was close to finishing
>> the sync
>> -- it seemed it never released the packages objects, as packages saved
>> increased, memory consumption increased - Commented comps (RHEL 7 doesn't
>> have modules) - somewhat improvement on memory performance.
>>
>> Changed changelogs to empty list  + Commented comps  = Finally able to
>> sync!
>>
>> My final guesses:
>> - Models related to Comps also have huge JSONFields
>> - Memory is not released after saving packages due to relations,
>> Comps/Modules points to lots of packages objects. When saving
>> PackageGroups, I could see memory being released.
>>
>> Log Analysis:
>> After being able to sync with 8GB I started to look the logs to confirm
>> Daniel's suspicion - #5701 <https://pulp.plan.io/issues/5701>
>> removing duplicates is consuming a lot of memory also. Before removing
>> duplicates, the peak of memory consumption was below 60%.
>> During remove duplicates, the peak was above 80%
>>
>>
>> [a]
>> https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
>>
>> Thanks @Daniel Alley <dalley at redhat.com>, @Dennis Kliban
>> <dkliban at redhat.com>, @Tatiana Tereshchenko <ttereshc at redhat.com> and @Justin
>> Sherrill <jsherril at redhat.com>  for all of your suggestions!
>>
>> Best regards,
>> Fabricio Aguiar
>> Software Engineer, Pulp Project
>> Red Hat Brazil - Latam <https://www.redhat.com/>
>> +55 11 999652368
>>
>>
>> On Mon, Dec 2, 2019 at 7:16 PM Fabricio Aguiar <
>> fabricio.aguiar at redhat.com> wrote:
>>
>>> As Tanya suggested I tried to sync without changelogs, even so, I still
>>> having memory problems =/
>>>
>>> Best regards,
>>> Fabricio Aguiar
>>> Software Engineer, Pulp Project
>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>> +55 11 999652368
>>>
>>>
>>> On Mon, Dec 2, 2019 at 5:13 PM Fabricio Aguiar <
>>> fabricio.aguiar at redhat.com> wrote:
>>>
>>>> from what I see the problem is not a package itself, it is the set of
>>>> packages with huge changelogs.
>>>> Observing the logs, I guess postgres take some time to save a package
>>>> with very large changelog,
>>>> so it seems like it creates a kind of queue for executing each query,
>>>> ad I think that is the point when the memory consumption raises.
>>>> So as huge as changelog is, more time it will take to save, more time
>>>> equal more queries waiting to be executed leading to high consumption of
>>>> memory (My guess due the logs, I don't know how django communicates with
>>>> postgres).
>>>>
>>>> Best regards,
>>>> Fabricio Aguiar
>>>> Software Engineer, Pulp Project
>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>> +55 11 999652368
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 5:06 PM Daniel Alley <dalley at redhat.com> wrote:
>>>>
>>>>> sorted() creates a new, duplicate list instead of sorting in place
>>>>> like .sort().  Make sure it's not crashing because of duplicating that list
>>>>> vs. processing a particular troublesome package first.
>>>>>
>>>>> Speaking of which, do you know which package that is?
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2019 at 2:49 PM Fabricio Aguiar <
>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>
>>>>>> Today I started to notice that every time the process went out of
>>>>>> memory it was around 24000 packages saved,
>>>>>> so I started to look at what could cause it.
>>>>>> At first, I thought the problem was ProgressReport being incremented
>>>>>> and saved in each iteration,
>>>>>> so I changed it, and I had some considerable improvement in memory
>>>>>> consumption, but the process still crashing.
>>>>>>
>>>>>> Then I looked at the logs, I saw that even after crashing it still
>>>>>> logging a very huge text, so I did it:
>>>>>>
>>>>>> packages_values = sorted(
>>>>>>     packages.values(),
>>>>>>     key=lambda p: len(str(p.changelogs)),
>>>>>>     reverse=True
>>>>>> )for pkg in packages_values:
>>>>>>     package = Package(**Package.createrepo_to_dict(pkg))
>>>>>>
>>>>>>
>>>>>> I sorted the packages by the size of its changelog, and then I saw it crashing right on the beginning of the sync:
>>>>>>
>>>>>>
>>>>>> In [2]: len(Package.objects.last().changelogs)
>>>>>> Out[2]: 1071
>>>>>>
>>>>>> In [3]: len(str(Package.objects.last().changelogs))
>>>>>> Out[3]: 8232715
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Fabricio Aguiar
>>>>>> Software Engineer, Pulp Project
>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>> +55 11 999652368
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 27, 2019 at 10:13 AM Fabricio Aguiar <
>>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>>
>>>>>>> Thank you for these numbers,
>>>>>>> I put it on my scrum status, but it worth to share it here also:
>>>>>>>    -- got some really nice feedback from dalley and dkliban
>>>>>>>    -- performance problem seems to be related with query issues
>>>>>>>    -- started to dig into queries
>>>>>>>    -- find out process crashes during ContentSaver
>>>>>>>    -- when using rpdb noticed a decrease in the use of memory
>>>>>>>    -- replaced rpdb with timer.sleep(5) and noticed a slight
>>>>>>> decrease in the use of memory
>>>>>>>    -- my theory is: with a huge amount of content, and with stages
>>>>>>> API going so fast, it cannot release memory fast enough so cache is
>>>>>>> accumulating
>>>>>>>    -- started to read about django cache
>>>>>>>    -- found this [1], tomorrow I will try to replace with [2]
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.djangoproject.com/en/2.2/topics/cache/#local-memory-caching
>>>>>>> [2] https://docs.djangoproject.com/en/2.2/topics/cache/#memcached
>>>>>>>
>>>>>>> So taking these numbers, for ContentSaver what matter is not a
>>>>>>> specific type, then:
>>>>>>>
>>>>>>> CentOS7 - 10 000 RPMs, 0 Advisories      =  10 000 contents
>>>>>>> EPEL7 - 13 500 RPMs, 4400 Advisories    =  17 900 contents
>>>>>>> RHEL7 - 26 000 RPMs, 3900 Advisories    =  29 900 contents
>>>>>>>
>>>>>>> I have found this really nice article, and I believe our problem is
>>>>>>> pretty similar:
>>>>>>>
>>>>>>> https://medium.com/@chakrabortypritish/django-optimization-how-to-avoid-memory-mishaps-1e55b2aa947c
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Fabricio Aguiar
>>>>>>> Software Engineer, Pulp Project
>>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>>> +55 11 999652368
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 27, 2019 at 8:28 AM Tatiana Tereshchenko <
>>>>>>> ttereshc at redhat.com> wrote:
>>>>>>>
>>>>>>>> You know that I'd be the first one to suspect Advisories (aka
>>>>>>>> Errata) :)
>>>>>>>> For the sake of fairness to Advisories, here are some rough numbers
>>>>>>>> (at least for the repos which we are using):
>>>>>>>>
>>>>>>>> CentOS7 - 10 000 RPMs, 0 Advisories
>>>>>>>> EPEL7 - 13 500 RPMs, 4400 Advisories
>>>>>>>> RHEL7 - 26 000 RPMs, 3900 Advisories
>>>>>>>>
>>>>>>>> If the RHEL7 repo causes the problem and EPEL7 does not, then it
>>>>>>>> seems like the main impact is from the number of RPMs. Maybe focusing on
>>>>>>>> what Daniel pointed out is a good start?
>>>>>>>>
>>>>>>>> If we profile CentOS and EPEL, we'll get slightly better idea on
>>>>>>>> the impact of Advisories but still EPEL7 has 30% more RPMs. It's hard to
>>>>>>>> compare.
>>>>>>>>
>>>>>>>> Tanya
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 26, 2019 at 9:25 PM Dana Walker <dawalker at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Out of curiosity, do we have any data on performance/memory usage
>>>>>>>>> from before some of the major features were recently merged?  I'm curious
>>>>>>>>> as to whether comps added significantly to this given how many relations
>>>>>>>>> are handled since you mentioned the deduplication step being a key point.
>>>>>>>>>
>>>>>>>>> Dana Walker
>>>>>>>>>
>>>>>>>>> She / Her / Hers
>>>>>>>>>
>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>
>>>>>>>>> Red Hat <https://www.redhat.com>
>>>>>>>>>
>>>>>>>>> dawalker at redhat.com
>>>>>>>>> <https://www.redhat.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 26, 2019 at 3:07 PM Daniel Alley <dalley at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> JSON decoding was also a significant slice of time, so, possibly
>>>>>>>>>> the problem is how JSON blobs behave with these big queries.  And of course
>>>>>>>>>> Erratas have tons of those.
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 26, 2019 at 1:50 PM Fabricio Aguiar <
>>>>>>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Dennis and Daniel!
>>>>>>>>>>>
>>>>>>>>>>> those are really good points Daniel!
>>>>>>>>>>>
>>>>>>>>>>> I noticed sync and re-sync behaves almost the same for a period
>>>>>>>>>>> of time:
>>>>>>>>>>> [image: sync_and_resync.jpg]
>>>>>>>>>>>
>>>>>>>>>>> so I was guessing the problem would be when starting to save the
>>>>>>>>>>> contents.
>>>>>>>>>>> Combining Daniel and Dennis hints, probably the problem is a
>>>>>>>>>>> very long query with many Erratas!
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Fabricio Aguiar
>>>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>>>>>>> +55 11 999652368
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 26, 2019 at 3:26 PM Daniel Alley <dalley at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Fabricio, this is great work!
>>>>>>>>>>>>
>>>>>>>>>>>> One thing that stands out is that a very large amount of time
>>>>>>>>>>>> is being spent in remove_duplicates(), 65% of the total runtime of the sync.
>>>>>>>>>>>>
>>>>>>>>>>>>    - 13% of the total runtime spent on this inside cast()
>>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L31
>>>>>>>>>>>>    - 31% of the total runtime spent in these lines
>>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L37-L38
>>>>>>>>>>>>    - 20% of the total runtime spent on this line
>>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L42
>>>>>>>>>>>>
>>>>>>>>>>>> There's a couple of suggestions on how to improve this code
>>>>>>>>>>>> written up here https://pulp.plan.io/issues/5701
>>>>>>>>>>>>
>>>>>>>>>>>> The memory usage seems to be coming from the Django ORM
>>>>>>>>>>>> backend.  My guess is that lines 37 & 38 (linked above) are generating some
>>>>>>>>>>>> extremely long SQL, because it's basically doing OR with a vast number of
>>>>>>>>>>>> individual queries. And I think that is also not great for performance.
>>>>>>>>>>>> https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
>>>>>>>>>>>>
>>>>>>>>>>>> Another hint that that might be the case is that the memory
>>>>>>>>>>>> usage grows and grows and then collapses all at once.  That would make
>>>>>>>>>>>> sense if the SQL compiler's data structure is growing and growing and then
>>>>>>>>>>>> being destroyed after the query executes.
>>>>>>>>>>>>
>>>>>>>>>>>> I would imagine that if we fix our queries, we'll both fix the
>>>>>>>>>>>> memory consumption and improve performance at the same time.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Nov 26, 2019 at 11:57 AM Fabricio Aguiar <
>>>>>>>>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been investigating memory consumption for syncing on RPM
>>>>>>>>>>>>> plugin - issue 5688 <https://pulp.plan.io/issues/5688>, and I
>>>>>>>>>>>>> want to share what I've found so far.
>>>>>>>>>>>>>
>>>>>>>>>>>>> First, it seems that a new process is created when we started
>>>>>>>>>>>>> to sync, and this new process is responsible for the large amount of memory
>>>>>>>>>>>>> consumption observed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As I shared on some notes here:
>>>>>>>>>>>>> https://pulp.plan.io/issues/5688
>>>>>>>>>>>>> The problem seems to affect only RHEL, I did not see a large
>>>>>>>>>>>>> consumption of memory for the following repos:
>>>>>>>>>>>>> - CentOS 6: http://mirror.centos.org/centos-6/6.10/os/x86_64/
>>>>>>>>>>>>> - CentOS 7:
>>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/
>>>>>>>>>>>>> - CentOS 8 AppStrem:
>>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/8/AppStream/x86_64/os/
>>>>>>>>>>>>> - CentOS 8 BaseOS:
>>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/8/BaseOS/x86_64/os/
>>>>>>>>>>>>> - EPEL 7: https://dl.fedoraproject.org/pub/epel/7/x86_64/
>>>>>>>>>>>>>
>>>>>>>>>>>>> As stated on the issue, with RHEL 7 we can observe the
>>>>>>>>>>>>> problem, for my tests, I used the following repo:
>>>>>>>>>>>>> http://cdn.stage.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/
>>>>>>>>>>>>>
>>>>>>>>>>>>> For my tests, I used pulp3-source-fedora30 vagrant box,
>>>>>>>>>>>>> initially set with 4GB, then I increased to 16GB, and it was not enough. I
>>>>>>>>>>>>> only was able to sync RHEL 7 when I increased the memory up to 23GB.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Utilizing py-spy I got those SVG profiles attached, you can
>>>>>>>>>>>>> find them hosted here:
>>>>>>>>>>>>> Sync - https://sendeyo.com/up/d/90a8ae4c8f
>>>>>>>>>>>>> Re-sync - https://sendeyo.com/up/d/4c855bcce3
>>>>>>>>>>>>>
>>>>>>>>>>>>> With the following branch, I was able to take some tracemalloc
>>>>>>>>>>>>> <https://docs.python.org/3/library/tracemalloc.html>
>>>>>>>>>>>>> snapshots (every 30 seconds):
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/pulp/pulp_rpm/commit/f3f079010cfe81b7f5cf3ef94c2402b1ccf7d90c
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Tracemalloc for Sync at peak of memory consumption:*
>>>>>>>>>>>>>
>>>>>>>>>>>>> #1: sql/compiler.py:1512: 7825340.4 KiB
>>>>>>>>>>>>> #2: tasks/synchronizing.py:154: 4062651.9 KiB
>>>>>>>>>>>>> #3: stages/declarative_version.py:149: 4062601.9 KiB
>>>>>>>>>>>>> #4: models/repository.py:631: 4062130.0 KiB
>>>>>>>>>>>>> #5: models/repository.py:96: 4062127.7 KiB
>>>>>>>>>>>>> #6: rq/job.py:611: 4060459.2 KiB
>>>>>>>>>>>>> #7: rq/job.py:605: 4058273.8 KiB
>>>>>>>>>>>>> #8: rq/worker.py:822: 4053883.3 KiB
>>>>>>>>>>>>> #9: tasking/worker.py:100: 4053875.9 KiB
>>>>>>>>>>>>> #10: rq/worker.py:684: 4053849.9 KiB
>>>>>>>>>>>>> #11: rq/worker.py:610: 4051645.7 KiB
>>>>>>>>>>>>> #12: plugin/repo_version_utils.py:31: 4028192.9 KiB
>>>>>>>>>>>>> #13: models/base.py:124: 4028191.2 KiB
>>>>>>>>>>>>> #14: fields/related_descriptors.py:401: 4028165.2 KiB
>>>>>>>>>>>>> #15: models/query.py:1242: 4021212.9 KiB
>>>>>>>>>>>>> #16: models/query.py:402: 3995284.5 KiB
>>>>>>>>>>>>> #17: models/query.py:256: 3995284.5 KiB
>>>>>>>>>>>>> #18: models/query.py:55: 3952718.2 KiB
>>>>>>>>>>>>> #19: sql/compiler.py:1133: 3919417.7 KiB
>>>>>>>>>>>>> #20: db/utils.py:96: 3912674.4 KiB
>>>>>>>>>>>>> #21: psycopg2/_json.py:166: 3837534.7 KiB
>>>>>>>>>>>>> #22: json/decoder.py:337: 3837533.2 KiB
>>>>>>>>>>>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>>>>>>>>>>> #23: json/__init__.py:348: 3837533.2 KiB
>>>>>>>>>>>>>     return _default_decoder.decode(s)
>>>>>>>>>>>>> #24: json/decoder.py:353: 3837532.7 KiB
>>>>>>>>>>>>>     obj, end = self.scan_once(s, idx)
>>>>>>>>>>>>> #25: rq/worker.py:670: 211891.5 KiB
>>>>>>>>>>>>> #26: tasking/worker.py:69: 198267.0 KiB
>>>>>>>>>>>>> #27: rq/worker.py:477: 191261.5 KiB
>>>>>>>>>>>>> #28: cli/cli.py:252: 186110.9 KiB
>>>>>>>>>>>>> #29: click/core.py:555: 185957.8 KiB
>>>>>>>>>>>>> #30: cli/cli.py:78: 104822.7 KiB
>>>>>>>>>>>>> #31: models/query.py:73: 48350.9 KiB
>>>>>>>>>>>>> #32: models/base.py:513: 48350.9 KiB
>>>>>>>>>>>>> #33: sql/compiler.py:405: 38126.1 KiB
>>>>>>>>>>>>> #34: sql/compiler.py:1087: 33355.6 KiB
>>>>>>>>>>>>> #35: click/core.py:956: 29580.3 KiB
>>>>>>>>>>>>> #36: click/core.py:1137: 29498.3 KiB
>>>>>>>>>>>>> #37: models/base.py:430: 29340.9 KiB
>>>>>>>>>>>>> #38: models/query.py:274: 25946.8 KiB
>>>>>>>>>>>>> #39: plugin/repo_version_utils.py:30: 25850.4 KiB
>>>>>>>>>>>>> #40: models/query.py:892: 24105.6 KiB
>>>>>>>>>>>>> #41: models/query.py:910: 24089.7 KiB
>>>>>>>>>>>>> #42: models/query.py:399: 24045.0 KiB
>>>>>>>>>>>>> #43: sql/query.py:1290: 17343.9 KiB
>>>>>>>>>>>>> #44: sql/where.py:81: 15629.7 KiB
>>>>>>>>>>>>> #45: sql/compiler.py:489: 15581.6 KiB
>>>>>>>>>>>>> #46: models/lookups.py:162: 15527.0 KiB
>>>>>>>>>>>>> #47: psycopg2/extras.py:678: 15027.3 KiB
>>>>>>>>>>>>> #48: sql/query.py:1323: 13486.4 KiB
>>>>>>>>>>>>> #49: sql/query.py:796: 13486.3 KiB
>>>>>>>>>>>>> #50: sql/compiler.py:474: 11291.3 KiB
>>>>>>>>>>>>> #51: sql/compiler.py:54: 11282.7 KiB
>>>>>>>>>>>>> #52: sql/compiler.py:45: 11282.7 KiB
>>>>>>>>>>>>> #53: click/core.py:717: 10190.6 KiB
>>>>>>>>>>>>> #54: models/query.py:72: 9967.2 KiB
>>>>>>>>>>>>> #55: models/query.py:1219: 8855.6 KiB
>>>>>>>>>>>>> #56: models/query.py:1231: 8855.6 KiB
>>>>>>>>>>>>> #57: models/query.py:1072: 8840.0 KiB
>>>>>>>>>>>>> #58: models/query.py:401: 8840.0 KiB
>>>>>>>>>>>>> #59: models/lookups.py:153: 8779.2 KiB
>>>>>>>>>>>>> #60: models/lookups.py:79: 8773.0 KiB
>>>>>>>>>>>>> #61: models/expressions.py:332: 8769.2 KiB
>>>>>>>>>>>>> #62: models/expressions.py:238: 8769.2 KiB
>>>>>>>>>>>>> #63: plugin/repo_version_utils.py:37: 7602.2 KiB
>>>>>>>>>>>>> #64: models/base.py:408: 7378.9 KiB
>>>>>>>>>>>>> #65: click/core.py:764: 7371.2 KiB
>>>>>>>>>>>>> #66: sql/compiler.py:254: 6853.7 KiB
>>>>>>>>>>>>> #67: models/expressions.py:737: 6851.8 KiB
>>>>>>>>>>>>> #68: models/lookups.py:159: 6745.8 KiB
>>>>>>>>>>>>> #69: sql/query.py:1293: 6744.2 KiB
>>>>>>>>>>>>> #70: sql/query.py:2258: 6743.6 KiB
>>>>>>>>>>>>> #71: sql/query.py:763: 6743.6 KiB
>>>>>>>>>>>>> #72: sql/query.py:2259: 6742.7 KiB
>>>>>>>>>>>>> #73: sql/compiler.py:1054: 6696.5 KiB
>>>>>>>>>>>>> #74: utils/deconstruct.py:17: 6261.2 KiB
>>>>>>>>>>>>> #75: fields/related.py:986: 6246.5 KiB
>>>>>>>>>>>>> #76: fields/__init__.py:381: 6245.8 KiB
>>>>>>>>>>>>> #77: models/base.py:411: 5990.2 KiB
>>>>>>>>>>>>> #78: python3.7/uuid.py:204: 5876.0 KiB
>>>>>>>>>>>>>     self.__dict__['int'] = int
>>>>>>>>>>>>> #79: sql/compiler.py:472: 5784.0 KiB
>>>>>>>>>>>>> #80: sql/compiler.py:1019: 5023.7 KiB
>>>>>>>>>>>>> #81: sql/compiler.py:1053: 4916.2 KiB
>>>>>>>>>>>>> #82: sql/query.py:350: 4490.9 KiB
>>>>>>>>>>>>> #83: sql/query.py:309: 4438.3 KiB
>>>>>>>>>>>>> #84: sql/compiler.py:219: 4406.5 KiB
>>>>>>>>>>>>> #85: sql/compiler.py:666: 4401.8 KiB
>>>>>>>>>>>>> #86: python3.7/copy.py:96: 4399.3 KiB
>>>>>>>>>>>>>     rv = reductor(4)
>>>>>>>>>>>>> #87: models/expressions.py:159: 4398.6 KiB
>>>>>>>>>>>>> #88: python3.7/copyreg.py:88: 4372.9 KiB
>>>>>>>>>>>>>     return cls.__new__(cls, *args)
>>>>>>>>>>>>> #89: python3.7/copy.py:274: 4372.9 KiB
>>>>>>>>>>>>>     y = func(*args)
>>>>>>>>>>>>> #90: python3.7/copy.py:106: 4372.8 KiB
>>>>>>>>>>>>>     return _reconstruct(x, None, *rv)
>>>>>>>>>>>>> #91: models/query_utils.py:82: 4223.2 KiB
>>>>>>>>>>>>> #92: bin/rq:10: 4115.9 KiB
>>>>>>>>>>>>> #93: python3.7/uuid.py:161: 4025.9 KiB
>>>>>>>>>>>>>     int = int_(hex, 16)
>>>>>>>>>>>>> #94: sql/query.py:1318: 3846.5 KiB
>>>>>>>>>>>>> #95: models/expressions.py:749: 3362.8 KiB
>>>>>>>>>>>>> #96: fields/__init__.py:709: 3340.3 KiB
>>>>>>>>>>>>> #97: utils/tree.py:108: 3008.9 KiB
>>>>>>>>>>>>> #98: models/query_utils.py:72: 2534.5 KiB
>>>>>>>>>>>>> #99: models/query_utils.py:85: 2534.2 KiB
>>>>>>>>>>>>> #100: models/base.py:503: 2214.3 KiB
>>>>>>>>>>>>> #101: models/query.py:199: 2190.8 KiB
>>>>>>>>>>>>> #102: models/query.py:190: 2178.6 KiB
>>>>>>>>>>>>> #103: models/query_utils.py:74: 2164.6 KiB
>>>>>>>>>>>>> #104: utils/tree.py:23: 2115.5 KiB
>>>>>>>>>>>>> #105: models/query_utils.py:59: 2113.1 KiB
>>>>>>>>>>>>> #106: sql/query.py:1251: 1895.8 KiB
>>>>>>>>>>>>> #107: sql/query.py:1116: 1895.3 KiB
>>>>>>>>>>>>> #108: models/lookups.py:19: 1892.1 KiB
>>>>>>>>>>>>> #109: sql/query.py:68: 1882.4 KiB
>>>>>>>>>>>>> #110: sql/query.py:1249: 1881.4 KiB
>>>>>>>>>>>>> #111: fields/related.py:974: 1679.4 KiB
>>>>>>>>>>>>> #112: sql/compiler.py:1018: 1672.8 KiB
>>>>>>>>>>>>> #113: base/operations.py:564: 1672.8 KiB
>>>>>>>>>>>>> #114: models/expressions.py:747: 1660.9 KiB
>>>>>>>>>>>>> #115: models/query.py:1893: 1590.1 KiB
>>>>>>>>>>>>> #116: models/query.py:63: 1590.1 KiB
>>>>>>>>>>>>> #117: models/query.py:62: 1554.4 KiB
>>>>>>>>>>>>> #118: models/query.py:61: 1554.4 KiB
>>>>>>>>>>>>> 962 other: 9.8 MiB
>>>>>>>>>>>>> Total allocated size: 99186.9 MiB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Tracemalloc for Re-sync at peak of memory consumption:*
>>>>>>>>>>>>>
>>>>>>>>>>>>> #1: sql/compiler.py:1512: 1813416.3 KiB
>>>>>>>>>>>>> #2: asyncio/events.py:88: 1159219.2 KiB
>>>>>>>>>>>>>     self._context.run(self._callback, *self._args)
>>>>>>>>>>>>> #3: stages/api.py:43: 1159151.2 KiB
>>>>>>>>>>>>> #4: asyncio/base_events.py:1775: 1158553.1 KiB
>>>>>>>>>>>>>     if bool(enabled) == bool(self._coroutine_origin_tracking_enabled):
>>>>>>>>>>>>> #5: asyncio/base_events.py:539: 1157867.4 KiB
>>>>>>>>>>>>>     self._thread_id = None
>>>>>>>>>>>>> #6: asyncio/base_events.py:571: 1156528.6 KiB
>>>>>>>>>>>>>     # local task.
>>>>>>>>>>>>> #7: stages/declarative_version.py:149: 1152654.5 KiB
>>>>>>>>>>>>> #8: tasks/synchronizing.py:154: 1151586.8 KiB
>>>>>>>>>>>>> #9: rq/job.py:611: 1147795.3 KiB
>>>>>>>>>>>>> #10: rq/job.py:605: 1147534.7 KiB
>>>>>>>>>>>>> #11: rq/worker.py:822: 1141556.0 KiB
>>>>>>>>>>>>> #12: tasking/worker.py:100: 1139872.3 KiB
>>>>>>>>>>>>> #13: rq/worker.py:684: 1137457.4 KiB
>>>>>>>>>>>>> #14: rq/worker.py:610: 1136707.1 KiB
>>>>>>>>>>>>> #15: models/query.py:274: 940312.1 KiB
>>>>>>>>>>>>> #16: models/query.py:1242: 921581.8 KiB
>>>>>>>>>>>>> #17: models/query.py:55: 908503.7 KiB
>>>>>>>>>>>>> #18: sql/compiler.py:1133: 906747.3 KiB
>>>>>>>>>>>>> #19: db/utils.py:96: 906707.8 KiB
>>>>>>>>>>>>> #20: stages/content_stages.py:48: 899314.9 KiB
>>>>>>>>>>>>> #21: psycopg2/_json.py:166: 872466.2 KiB
>>>>>>>>>>>>> #22: json/decoder.py:337: 871817.4 KiB
>>>>>>>>>>>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>>>>>>>>>>> #23: json/__init__.py:348: 871817.4 KiB
>>>>>>>>>>>>>     return _default_decoder.decode(s)
>>>>>>>>>>>>> #24: json/decoder.py:353: 871816.8 KiB
>>>>>>>>>>>>>     obj, end = self.scan_once(s, idx)
>>>>>>>>>>>>> #25: click/core.py:555: 448794.2 KiB
>>>>>>>>>>>>> #26: rq/worker.py:670: 259351.8 KiB
>>>>>>>>>>>>> #27: tasking/worker.py:69: 257743.4 KiB
>>>>>>>>>>>>> #28: rq/worker.py:477: 249243.3 KiB
>>>>>>>>>>>>> #29: cli/cli.py:252: 244023.2 KiB
>>>>>>>>>>>>> #30: cli/cli.py:78: 226347.5 KiB
>>>>>>>>>>>>> #31: click/core.py:956: 220086.5 KiB
>>>>>>>>>>>>> #32: click/core.py:1137: 213146.8 KiB
>>>>>>>>>>>>> #33: click/core.py:717: 188052.0 KiB
>>>>>>>>>>>>> #34: tasks/synchronizing.py:581: 177471.0 KiB
>>>>>>>>>>>>> #35: models/package.py:232: 138350.9 KiB
>>>>>>>>>>>>> #36: stages/artifact_stages.py:219: 54726.5 KiB
>>>>>>>>>>>>> #37: stages/artifact_stages.py:251: 54723.6 KiB
>>>>>>>>>>>>> #38: models/query.py:1625: 47373.9 KiB
>>>>>>>>>>>>> #39: models/query.py:1738: 46489.3 KiB
>>>>>>>>>>>>> #40: fields/related_descriptors.py:627: 42352.2 KiB
>>>>>>>>>>>>> #41: models/package.py:238: 24673.4 KiB
>>>>>>>>>>>>> #42: models/query.py:1591: 24325.2 KiB
>>>>>>>>>>>>> #43: models/query.py:1244: 18744.3 KiB
>>>>>>>>>>>>> #44: models/query.py:771: 18744.3 KiB
>>>>>>>>>>>>> #45: models/query.py:73: 12827.6 KiB
>>>>>>>>>>>>> #46: models/base.py:513: 12827.6 KiB
>>>>>>>>>>>>> #47: sql/compiler.py:405: 11912.8 KiB
>>>>>>>>>>>>> #48: psycopg2/extras.py:678: 11666.5 KiB
>>>>>>>>>>>>> #49: models/package.py:245: 10837.2 KiB
>>>>>>>>>>>>> #50: sql/compiler.py:1087: 7407.8 KiB
>>>>>>>>>>>>> #51: sql/where.py:81: 6915.5 KiB
>>>>>>>>>>>>> #52: models/base.py:741: 6705.2 KiB
>>>>>>>>>>>>> #53: models/base.py:779: 6697.1 KiB
>>>>>>>>>>>>> #54: models/progress.py:144: 6607.2 KiB
>>>>>>>>>>>>> #55: click/core.py:764: 6603.1 KiB
>>>>>>>>>>>>> #56: models/progress.py:191: 6601.5 KiB
>>>>>>>>>>>>> #57: tasks/synchronizing.py:609: 6574.8 KiB
>>>>>>>>>>>>> #58: python3.7/uuid.py:204: 6532.6 KiB
>>>>>>>>>>>>>     self.__dict__['int'] = int
>>>>>>>>>>>>> #59: models/base.py:851: 6485.3 KiB
>>>>>>>>>>>>> #60: models/base.py:411: 6366.6 KiB
>>>>>>>>>>>>> #61: models/base.py:900: 6178.2 KiB
>>>>>>>>>>>>> #62: models/query.py:760: 5612.1 KiB
>>>>>>>>>>>>> #63: sql/compiler.py:1429: 5528.5 KiB
>>>>>>>>>>>>> #64: models/base.py:408: 5042.7 KiB
>>>>>>>>>>>>> #65: tasks/synchronizing.py:575: 4724.0 KiB
>>>>>>>>>>>>> #66: createrepo_c/__init__.py:312: 4722.6 KiB
>>>>>>>>>>>>> #67: tasks/synchronizing.py:285: 4722.6 KiB
>>>>>>>>>>>>> #68: models/lookups.py:162: 4583.4 KiB
>>>>>>>>>>>>> #69: tasks/synchronizing.py:582: 4236.2 KiB
>>>>>>>>>>>>> #70: tasks/synchronizing.py:256: 3655.1 KiB
>>>>>>>>>>>>> #71: sql/query.py:1312: 3358.8 KiB
>>>>>>>>>>>>> #72: sql/compiler.py:1417: 3312.9 KiB
>>>>>>>>>>>>> #73: models/query.py:892: 3121.7 KiB
>>>>>>>>>>>>> #74: models/lookups.py:153: 2934.4 KiB
>>>>>>>>>>>>> #75: models/lookups.py:79: 2931.5 KiB
>>>>>>>>>>>>> #76: models/expressions.py:332: 2918.8 KiB
>>>>>>>>>>>>> #77: models/expressions.py:238: 2918.5 KiB
>>>>>>>>>>>>> #78: fields/related_descriptors.py:219: 2766.3 KiB
>>>>>>>>>>>>> #79: fields/related_descriptors.py:629: 2765.8 KiB
>>>>>>>>>>>>> #80: models/query.py:910: 2575.7 KiB
>>>>>>>>>>>>> #81: sql/query.py:1290: 2513.6 KiB
>>>>>>>>>>>>> #82: sql/query.py:1318: 2257.2 KiB
>>>>>>>>>>>>> #83: stages/association_stages.py:36: 2219.3 KiB
>>>>>>>>>>>>> #84: sql/compiler.py:1452: 2214.1 KiB
>>>>>>>>>>>>> #85: sql/compiler.py:1370: 2214.1 KiB
>>>>>>>>>>>>> #86: python3.7/uuid.py:161: 2179.2 KiB
>>>>>>>>>>>>>     int = int_(hex, 16)
>>>>>>>>>>>>> #87: models/base.py:503: 2082.6 KiB
>>>>>>>>>>>>> #88: models/manager.py:82: 2008.6 KiB
>>>>>>>>>>>>> #89: tasks/synchronizing.py:585: 1803.4 KiB
>>>>>>>>>>>>> #90: fields/related_descriptors.py:527: 1767.5 KiB
>>>>>>>>>>>>> #91: python3.7/copy.py:96: 1754.9 KiB
>>>>>>>>>>>>>     rv = reductor(4)
>>>>>>>>>>>>> #92: urllib/parse.py:562: 1725.2 KiB
>>>>>>>>>>>>>     resolved_path.append('')
>>>>>>>>>>>>> #93: urllib/parse.py:489: 1725.2 KiB
>>>>>>>>>>>>>     if url and url[:1] != '/': url = '/' + url
>>>>>>>>>>>>> #94: urllib/parse.py:475: 1725.2 KiB
>>>>>>>>>>>>>     _coerce_args(*components))
>>>>>>>>>>>>> #95: models/lookups.py:159: 1647.3 KiB
>>>>>>>>>>>>> #96: models/expressions.py:159: 1625.9 KiB
>>>>>>>>>>>>> #97: tasks/synchronizing.py:512: 1619.8 KiB
>>>>>>>>>>>>> #98: models/query.py:1652: 1619.0 KiB
>>>>>>>>>>>>> #99: fields/__init__.py:801: 1604.3 KiB
>>>>>>>>>>>>> #100: models/base.py:473: 1545.2 KiB
>>>>>>>>>>>>> #101: python3.7/uuid.py:759: 1488.4 KiB
>>>>>>>>>>>>>     def uuid4():
>>>>>>>>>>>>> #102: models/package.py:248: 1455.4 KiB
>>>>>>>>>>>>> #103: sql/compiler.py:489: 1332.9 KiB
>>>>>>>>>>>>> #104: python3.7/copy.py:106: 1293.2 KiB
>>>>>>>>>>>>>     return _reconstruct(x, None, *rv)
>>>>>>>>>>>>> #105: python3.7/copyreg.py:88: 1279.1 KiB
>>>>>>>>>>>>>     return cls.__new__(cls, *args)
>>>>>>>>>>>>> #106: python3.7/copy.py:274: 1279.1 KiB
>>>>>>>>>>>>>     y = func(*args)
>>>>>>>>>>>>> #107: sql/query.py:350: 1119.1 KiB
>>>>>>>>>>>>> #108: models/query.py:1231: 1057.4 KiB
>>>>>>>>>>>>> #109: models/query.py:1219: 1057.4 KiB
>>>>>>>>>>>>> 1019 other: 34.8 MiB
>>>>>>>>>>>>> Total allocated size: 28304.4 MiB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The snapshots can be found here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  memory_data
>>>>>>>>>>>>> <https://drive.google.com/drive/folders/1mdJtdGDvbwq_jLc5m_T8VBjjb-bDRDmS>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Memory consumption over time:*
>>>>>>>>>>>>> [image: sync.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: resync.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: sync_and_resync.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Fabricio Aguiar
>>>>>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>>>>>>>>> +55 11 999652368
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/24f515aa/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync.png
Type: image/png
Size: 20783 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/24f515aa/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resync.png
Type: image/png
Size: 18626 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/24f515aa/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.png
Type: image/png
Size: 29043 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/24f515aa/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.jpg
Type: image/jpeg
Size: 57715 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/24f515aa/attachment.jpg>