[Pulp-dev] Memory consumption on RPM sync

Daniel Alley dalley at redhat.com
Sat Dec 7 16:29:59 UTC 2019


I've been looking into RPM plugin performance a little bit as well due to
the remove_duplicates issues that Fabricio was able to confirm, and I think
I've found more low-hanging fruit that is really easy to fix. So, I did.

Attached are some cProfile graphs of a CentOS 7 (
http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/) sync and resync is it
is currently. An initial sync of CentOS 7 takes about 6 minutes on my
machine.  A resync takes about 1 minute 45 seconds.

Observations:

   - Progress Reporting is *really* expensive, because we're incrementing
   the progress report one-by-one for each content unit (that is, thousands
   and thousands of extra, separate database updates). It actually takes much
   longer report the progress as it does to do the parsing itself -- 30% of
   the total runtime of both syncs and resyncs.  I'm working on a PR to
   address this by cutting way down on the granularity of the reporting.
   - gettext inside of plugin/stages/api.py::Stage.put() is taking up 13%
   of the runtime, because it is doing a "log.debug(_('%(name)s - put:
   %(content)s'), {'name': self, 'content': item})" for all of the content
   going into the pipeline.  I don't think it makes sense to do this even if
   it didn't have a performance cost, so I'm going to remove it.

With these two changes and my remove_duplicates changes, I managed to cut
the resync time down to 55 seconds and the initial sync time down to 1
minute 27 seconds.

I expect that remove_duplicates will still be a performance bottleneck even
after my changes, but I don't have any numbers on that yet.  Remove
duplicates only triggers when the metadata changes (new content added *and*
some exists already) or when you manually add content, so I'm going to wait
for CentOS metadata to change to capture the new results.


On Tue, Dec 3, 2019 at 1:36 PM Fabricio Aguiar <fabricio.aguiar at redhat.com>
wrote:

> After many attempts, today I was able to sync using a vagrant box with 8GB
> of memory, and it took 9011.04181 seconds(more than 2 hours) to sync
>
> I opened 2 WIP PRs:
> https://github.com/pulp/pulpcore/pull/432
> https://github.com/pulp/pulp_rpm/pull/1547
>
> --- These are very WIP PRs, it was just some tweaks I tried, to see how
> sync would behave ---
>
> Attempts:
> 1 - Replace SQL OR clauses with IN clauses [a] - slight improvement on
> memory performance;
> 2 - Sometimes ProgressReports were in the middle of a loop, doing 50
> updates queries in a row - slight improvement on memory performance;
> 3 - UpdateCollection was being saved on each iteration of a loop, changed
> to bulk creation - good improvement on memory performance;
> 4 - Introduced batch size on bulk creation - good improvement on memory
> performance;
>
> Guesses:
> - As I said yesterday when it was around 24000 packages saved, process
> crashed out of memory,
> -- I started to look at what could cause it, and when I sorted packages by
> changelog size, it crashed on less than 3000 packages saved. Changed
> changelogs to empty list - somewhat improvement on memory performance.
> - Memory consumption was always increasing when it was close to finishing
> the sync
> -- it seemed it never released the packages objects, as packages saved
> increased, memory consumption increased - Commented comps (RHEL 7 doesn't
> have modules) - somewhat improvement on memory performance.
>
> Changed changelogs to empty list  + Commented comps  = Finally able to
> sync!
>
> My final guesses:
> - Models related to Comps also have huge JSONFields
> - Memory is not released after saving packages due to relations,
> Comps/Modules points to lots of packages objects. When saving
> PackageGroups, I could see memory being released.
>
> Log Analysis:
> After being able to sync with 8GB I started to look the logs to confirm
> Daniel's suspicion - #5701 <https://pulp.plan.io/issues/5701>
> removing duplicates is consuming a lot of memory also. Before removing
> duplicates, the peak of memory consumption was below 60%.
> During remove duplicates, the peak was above 80%
>
>
> [a]
> https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
>
> Thanks @Daniel Alley <dalley at redhat.com>, @Dennis Kliban
> <dkliban at redhat.com>, @Tatiana Tereshchenko <ttereshc at redhat.com> and @Justin
> Sherrill <jsherril at redhat.com>  for all of your suggestions!
>
> Best regards,
> Fabricio Aguiar
> Software Engineer, Pulp Project
> Red Hat Brazil - Latam <https://www.redhat.com/>
> +55 11 999652368
>
>
> On Mon, Dec 2, 2019 at 7:16 PM Fabricio Aguiar <fabricio.aguiar at redhat.com>
> wrote:
>
>> As Tanya suggested I tried to sync without changelogs, even so, I still
>> having memory problems =/
>>
>> Best regards,
>> Fabricio Aguiar
>> Software Engineer, Pulp Project
>> Red Hat Brazil - Latam <https://www.redhat.com/>
>> +55 11 999652368
>>
>>
>> On Mon, Dec 2, 2019 at 5:13 PM Fabricio Aguiar <
>> fabricio.aguiar at redhat.com> wrote:
>>
>>> from what I see the problem is not a package itself, it is the set of
>>> packages with huge changelogs.
>>> Observing the logs, I guess postgres take some time to save a package
>>> with very large changelog,
>>> so it seems like it creates a kind of queue for executing each query, ad
>>> I think that is the point when the memory consumption raises.
>>> So as huge as changelog is, more time it will take to save, more time
>>> equal more queries waiting to be executed leading to high consumption of
>>> memory (My guess due the logs, I don't know how django communicates with
>>> postgres).
>>>
>>> Best regards,
>>> Fabricio Aguiar
>>> Software Engineer, Pulp Project
>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>> +55 11 999652368
>>>
>>>
>>> On Mon, Dec 2, 2019 at 5:06 PM Daniel Alley <dalley at redhat.com> wrote:
>>>
>>>> sorted() creates a new, duplicate list instead of sorting in place like
>>>> .sort().  Make sure it's not crashing because of duplicating that list vs.
>>>> processing a particular troublesome package first.
>>>>
>>>> Speaking of which, do you know which package that is?
>>>>
>>>>
>>>> On Mon, Dec 2, 2019 at 2:49 PM Fabricio Aguiar <
>>>> fabricio.aguiar at redhat.com> wrote:
>>>>
>>>>> Today I started to notice that every time the process went out of
>>>>> memory it was around 24000 packages saved,
>>>>> so I started to look at what could cause it.
>>>>> At first, I thought the problem was ProgressReport being incremented
>>>>> and saved in each iteration,
>>>>> so I changed it, and I had some considerable improvement in memory
>>>>> consumption, but the process still crashing.
>>>>>
>>>>> Then I looked at the logs, I saw that even after crashing it still
>>>>> logging a very huge text, so I did it:
>>>>>
>>>>> packages_values = sorted(
>>>>>     packages.values(),
>>>>>     key=lambda p: len(str(p.changelogs)),
>>>>>     reverse=True
>>>>> )for pkg in packages_values:
>>>>>     package = Package(**Package.createrepo_to_dict(pkg))
>>>>>
>>>>>
>>>>> I sorted the packages by the size of its changelog, and then I saw it crashing right on the beginning of the sync:
>>>>>
>>>>>
>>>>> In [2]: len(Package.objects.last().changelogs)
>>>>> Out[2]: 1071
>>>>>
>>>>> In [3]: len(str(Package.objects.last().changelogs))
>>>>> Out[3]: 8232715
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Fabricio Aguiar
>>>>> Software Engineer, Pulp Project
>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>> +55 11 999652368
>>>>>
>>>>>
>>>>> On Wed, Nov 27, 2019 at 10:13 AM Fabricio Aguiar <
>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>
>>>>>> Thank you for these numbers,
>>>>>> I put it on my scrum status, but it worth to share it here also:
>>>>>>    -- got some really nice feedback from dalley and dkliban
>>>>>>    -- performance problem seems to be related with query issues
>>>>>>    -- started to dig into queries
>>>>>>    -- find out process crashes during ContentSaver
>>>>>>    -- when using rpdb noticed a decrease in the use of memory
>>>>>>    -- replaced rpdb with timer.sleep(5) and noticed a slight decrease
>>>>>> in the use of memory
>>>>>>    -- my theory is: with a huge amount of content, and with stages
>>>>>> API going so fast, it cannot release memory fast enough so cache is
>>>>>> accumulating
>>>>>>    -- started to read about django cache
>>>>>>    -- found this [1], tomorrow I will try to replace with [2]
>>>>>>
>>>>>> [1]
>>>>>> https://docs.djangoproject.com/en/2.2/topics/cache/#local-memory-caching
>>>>>> [2] https://docs.djangoproject.com/en/2.2/topics/cache/#memcached
>>>>>>
>>>>>> So taking these numbers, for ContentSaver what matter is not a
>>>>>> specific type, then:
>>>>>>
>>>>>> CentOS7 - 10 000 RPMs, 0 Advisories      =  10 000 contents
>>>>>> EPEL7 - 13 500 RPMs, 4400 Advisories    =  17 900 contents
>>>>>> RHEL7 - 26 000 RPMs, 3900 Advisories    =  29 900 contents
>>>>>>
>>>>>> I have found this really nice article, and I believe our problem is
>>>>>> pretty similar:
>>>>>>
>>>>>> https://medium.com/@chakrabortypritish/django-optimization-how-to-avoid-memory-mishaps-1e55b2aa947c
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Fabricio Aguiar
>>>>>> Software Engineer, Pulp Project
>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>> +55 11 999652368
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 27, 2019 at 8:28 AM Tatiana Tereshchenko <
>>>>>> ttereshc at redhat.com> wrote:
>>>>>>
>>>>>>> You know that I'd be the first one to suspect Advisories (aka
>>>>>>> Errata) :)
>>>>>>> For the sake of fairness to Advisories, here are some rough numbers
>>>>>>> (at least for the repos which we are using):
>>>>>>>
>>>>>>> CentOS7 - 10 000 RPMs, 0 Advisories
>>>>>>> EPEL7 - 13 500 RPMs, 4400 Advisories
>>>>>>> RHEL7 - 26 000 RPMs, 3900 Advisories
>>>>>>>
>>>>>>> If the RHEL7 repo causes the problem and EPEL7 does not, then it
>>>>>>> seems like the main impact is from the number of RPMs. Maybe focusing on
>>>>>>> what Daniel pointed out is a good start?
>>>>>>>
>>>>>>> If we profile CentOS and EPEL, we'll get slightly better idea on
>>>>>>> the impact of Advisories but still EPEL7 has 30% more RPMs. It's hard to
>>>>>>> compare.
>>>>>>>
>>>>>>> Tanya
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 26, 2019 at 9:25 PM Dana Walker <dawalker at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Out of curiosity, do we have any data on performance/memory usage
>>>>>>>> from before some of the major features were recently merged?  I'm curious
>>>>>>>> as to whether comps added significantly to this given how many relations
>>>>>>>> are handled since you mentioned the deduplication step being a key point.
>>>>>>>>
>>>>>>>> Dana Walker
>>>>>>>>
>>>>>>>> She / Her / Hers
>>>>>>>>
>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>
>>>>>>>> Red Hat <https://www.redhat.com>
>>>>>>>>
>>>>>>>> dawalker at redhat.com
>>>>>>>> <https://www.redhat.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 26, 2019 at 3:07 PM Daniel Alley <dalley at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> JSON decoding was also a significant slice of time, so, possibly
>>>>>>>>> the problem is how JSON blobs behave with these big queries.  And of course
>>>>>>>>> Erratas have tons of those.
>>>>>>>>>
>>>>>>>>> On Tue, Nov 26, 2019 at 1:50 PM Fabricio Aguiar <
>>>>>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Dennis and Daniel!
>>>>>>>>>>
>>>>>>>>>> those are really good points Daniel!
>>>>>>>>>>
>>>>>>>>>> I noticed sync and re-sync behaves almost the same for a period
>>>>>>>>>> of time:
>>>>>>>>>> [image: sync_and_resync.jpg]
>>>>>>>>>>
>>>>>>>>>> so I was guessing the problem would be when starting to save the
>>>>>>>>>> contents.
>>>>>>>>>> Combining Daniel and Dennis hints, probably the problem is a very
>>>>>>>>>> long query with many Erratas!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Fabricio Aguiar
>>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>>>>>> +55 11 999652368
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 26, 2019 at 3:26 PM Daniel Alley <dalley at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Fabricio, this is great work!
>>>>>>>>>>>
>>>>>>>>>>> One thing that stands out is that a very large amount of time is
>>>>>>>>>>> being spent in remove_duplicates(), 65% of the total runtime of the sync.
>>>>>>>>>>>
>>>>>>>>>>>    - 13% of the total runtime spent on this inside cast()
>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L31
>>>>>>>>>>>    - 31% of the total runtime spent in these lines
>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L37-L38
>>>>>>>>>>>    - 20% of the total runtime spent on this line
>>>>>>>>>>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L42
>>>>>>>>>>>
>>>>>>>>>>> There's a couple of suggestions on how to improve this code
>>>>>>>>>>> written up here https://pulp.plan.io/issues/5701
>>>>>>>>>>>
>>>>>>>>>>> The memory usage seems to be coming from the Django ORM
>>>>>>>>>>> backend.  My guess is that lines 37 & 38 (linked above) are generating some
>>>>>>>>>>> extremely long SQL, because it's basically doing OR with a vast number of
>>>>>>>>>>> individual queries. And I think that is also not great for performance.
>>>>>>>>>>> https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
>>>>>>>>>>>
>>>>>>>>>>> Another hint that that might be the case is that the memory
>>>>>>>>>>> usage grows and grows and then collapses all at once.  That would make
>>>>>>>>>>> sense if the SQL compiler's data structure is growing and growing and then
>>>>>>>>>>> being destroyed after the query executes.
>>>>>>>>>>>
>>>>>>>>>>> I would imagine that if we fix our queries, we'll both fix the
>>>>>>>>>>> memory consumption and improve performance at the same time.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 26, 2019 at 11:57 AM Fabricio Aguiar <
>>>>>>>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> I've been investigating memory consumption for syncing on RPM
>>>>>>>>>>>> plugin - issue 5688 <https://pulp.plan.io/issues/5688>, and I
>>>>>>>>>>>> want to share what I've found so far.
>>>>>>>>>>>>
>>>>>>>>>>>> First, it seems that a new process is created when we started
>>>>>>>>>>>> to sync, and this new process is responsible for the large amount of memory
>>>>>>>>>>>> consumption observed.
>>>>>>>>>>>>
>>>>>>>>>>>> As I shared on some notes here:
>>>>>>>>>>>> https://pulp.plan.io/issues/5688
>>>>>>>>>>>> The problem seems to affect only RHEL, I did not see a large
>>>>>>>>>>>> consumption of memory for the following repos:
>>>>>>>>>>>> - CentOS 6: http://mirror.centos.org/centos-6/6.10/os/x86_64/
>>>>>>>>>>>> - CentOS 7:
>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/
>>>>>>>>>>>> - CentOS 8 AppStrem:
>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/8/AppStream/x86_64/os/
>>>>>>>>>>>> - CentOS 8 BaseOS:
>>>>>>>>>>>> http://mirror.linux.duke.edu/pub/centos/8/BaseOS/x86_64/os/
>>>>>>>>>>>> - EPEL 7: https://dl.fedoraproject.org/pub/epel/7/x86_64/
>>>>>>>>>>>>
>>>>>>>>>>>> As stated on the issue, with RHEL 7 we can observe the problem,
>>>>>>>>>>>> for my tests, I used the following repo:
>>>>>>>>>>>> http://cdn.stage.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/
>>>>>>>>>>>>
>>>>>>>>>>>> For my tests, I used pulp3-source-fedora30 vagrant box,
>>>>>>>>>>>> initially set with 4GB, then I increased to 16GB, and it was not enough. I
>>>>>>>>>>>> only was able to sync RHEL 7 when I increased the memory up to 23GB.
>>>>>>>>>>>>
>>>>>>>>>>>> Utilizing py-spy I got those SVG profiles attached, you can
>>>>>>>>>>>> find them hosted here:
>>>>>>>>>>>> Sync - https://sendeyo.com/up/d/90a8ae4c8f
>>>>>>>>>>>> Re-sync - https://sendeyo.com/up/d/4c855bcce3
>>>>>>>>>>>>
>>>>>>>>>>>> With the following branch, I was able to take some tracemalloc
>>>>>>>>>>>> <https://docs.python.org/3/library/tracemalloc.html> snapshots
>>>>>>>>>>>> (every 30 seconds):
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/pulp/pulp_rpm/commit/f3f079010cfe81b7f5cf3ef94c2402b1ccf7d90c
>>>>>>>>>>>>
>>>>>>>>>>>> *Tracemalloc for Sync at peak of memory consumption:*
>>>>>>>>>>>>
>>>>>>>>>>>> #1: sql/compiler.py:1512: 7825340.4 KiB
>>>>>>>>>>>> #2: tasks/synchronizing.py:154: 4062651.9 KiB
>>>>>>>>>>>> #3: stages/declarative_version.py:149: 4062601.9 KiB
>>>>>>>>>>>> #4: models/repository.py:631: 4062130.0 KiB
>>>>>>>>>>>> #5: models/repository.py:96: 4062127.7 KiB
>>>>>>>>>>>> #6: rq/job.py:611: 4060459.2 KiB
>>>>>>>>>>>> #7: rq/job.py:605: 4058273.8 KiB
>>>>>>>>>>>> #8: rq/worker.py:822: 4053883.3 KiB
>>>>>>>>>>>> #9: tasking/worker.py:100: 4053875.9 KiB
>>>>>>>>>>>> #10: rq/worker.py:684: 4053849.9 KiB
>>>>>>>>>>>> #11: rq/worker.py:610: 4051645.7 KiB
>>>>>>>>>>>> #12: plugin/repo_version_utils.py:31: 4028192.9 KiB
>>>>>>>>>>>> #13: models/base.py:124: 4028191.2 KiB
>>>>>>>>>>>> #14: fields/related_descriptors.py:401: 4028165.2 KiB
>>>>>>>>>>>> #15: models/query.py:1242: 4021212.9 KiB
>>>>>>>>>>>> #16: models/query.py:402: 3995284.5 KiB
>>>>>>>>>>>> #17: models/query.py:256: 3995284.5 KiB
>>>>>>>>>>>> #18: models/query.py:55: 3952718.2 KiB
>>>>>>>>>>>> #19: sql/compiler.py:1133: 3919417.7 KiB
>>>>>>>>>>>> #20: db/utils.py:96: 3912674.4 KiB
>>>>>>>>>>>> #21: psycopg2/_json.py:166: 3837534.7 KiB
>>>>>>>>>>>> #22: json/decoder.py:337: 3837533.2 KiB
>>>>>>>>>>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>>>>>>>>>> #23: json/__init__.py:348: 3837533.2 KiB
>>>>>>>>>>>>     return _default_decoder.decode(s)
>>>>>>>>>>>> #24: json/decoder.py:353: 3837532.7 KiB
>>>>>>>>>>>>     obj, end = self.scan_once(s, idx)
>>>>>>>>>>>> #25: rq/worker.py:670: 211891.5 KiB
>>>>>>>>>>>> #26: tasking/worker.py:69: 198267.0 KiB
>>>>>>>>>>>> #27: rq/worker.py:477: 191261.5 KiB
>>>>>>>>>>>> #28: cli/cli.py:252: 186110.9 KiB
>>>>>>>>>>>> #29: click/core.py:555: 185957.8 KiB
>>>>>>>>>>>> #30: cli/cli.py:78: 104822.7 KiB
>>>>>>>>>>>> #31: models/query.py:73: 48350.9 KiB
>>>>>>>>>>>> #32: models/base.py:513: 48350.9 KiB
>>>>>>>>>>>> #33: sql/compiler.py:405: 38126.1 KiB
>>>>>>>>>>>> #34: sql/compiler.py:1087: 33355.6 KiB
>>>>>>>>>>>> #35: click/core.py:956: 29580.3 KiB
>>>>>>>>>>>> #36: click/core.py:1137: 29498.3 KiB
>>>>>>>>>>>> #37: models/base.py:430: 29340.9 KiB
>>>>>>>>>>>> #38: models/query.py:274: 25946.8 KiB
>>>>>>>>>>>> #39: plugin/repo_version_utils.py:30: 25850.4 KiB
>>>>>>>>>>>> #40: models/query.py:892: 24105.6 KiB
>>>>>>>>>>>> #41: models/query.py:910: 24089.7 KiB
>>>>>>>>>>>> #42: models/query.py:399: 24045.0 KiB
>>>>>>>>>>>> #43: sql/query.py:1290: 17343.9 KiB
>>>>>>>>>>>> #44: sql/where.py:81: 15629.7 KiB
>>>>>>>>>>>> #45: sql/compiler.py:489: 15581.6 KiB
>>>>>>>>>>>> #46: models/lookups.py:162: 15527.0 KiB
>>>>>>>>>>>> #47: psycopg2/extras.py:678: 15027.3 KiB
>>>>>>>>>>>> #48: sql/query.py:1323: 13486.4 KiB
>>>>>>>>>>>> #49: sql/query.py:796: 13486.3 KiB
>>>>>>>>>>>> #50: sql/compiler.py:474: 11291.3 KiB
>>>>>>>>>>>> #51: sql/compiler.py:54: 11282.7 KiB
>>>>>>>>>>>> #52: sql/compiler.py:45: 11282.7 KiB
>>>>>>>>>>>> #53: click/core.py:717: 10190.6 KiB
>>>>>>>>>>>> #54: models/query.py:72: 9967.2 KiB
>>>>>>>>>>>> #55: models/query.py:1219: 8855.6 KiB
>>>>>>>>>>>> #56: models/query.py:1231: 8855.6 KiB
>>>>>>>>>>>> #57: models/query.py:1072: 8840.0 KiB
>>>>>>>>>>>> #58: models/query.py:401: 8840.0 KiB
>>>>>>>>>>>> #59: models/lookups.py:153: 8779.2 KiB
>>>>>>>>>>>> #60: models/lookups.py:79: 8773.0 KiB
>>>>>>>>>>>> #61: models/expressions.py:332: 8769.2 KiB
>>>>>>>>>>>> #62: models/expressions.py:238: 8769.2 KiB
>>>>>>>>>>>> #63: plugin/repo_version_utils.py:37: 7602.2 KiB
>>>>>>>>>>>> #64: models/base.py:408: 7378.9 KiB
>>>>>>>>>>>> #65: click/core.py:764: 7371.2 KiB
>>>>>>>>>>>> #66: sql/compiler.py:254: 6853.7 KiB
>>>>>>>>>>>> #67: models/expressions.py:737: 6851.8 KiB
>>>>>>>>>>>> #68: models/lookups.py:159: 6745.8 KiB
>>>>>>>>>>>> #69: sql/query.py:1293: 6744.2 KiB
>>>>>>>>>>>> #70: sql/query.py:2258: 6743.6 KiB
>>>>>>>>>>>> #71: sql/query.py:763: 6743.6 KiB
>>>>>>>>>>>> #72: sql/query.py:2259: 6742.7 KiB
>>>>>>>>>>>> #73: sql/compiler.py:1054: 6696.5 KiB
>>>>>>>>>>>> #74: utils/deconstruct.py:17: 6261.2 KiB
>>>>>>>>>>>> #75: fields/related.py:986: 6246.5 KiB
>>>>>>>>>>>> #76: fields/__init__.py:381: 6245.8 KiB
>>>>>>>>>>>> #77: models/base.py:411: 5990.2 KiB
>>>>>>>>>>>> #78: python3.7/uuid.py:204: 5876.0 KiB
>>>>>>>>>>>>     self.__dict__['int'] = int
>>>>>>>>>>>> #79: sql/compiler.py:472: 5784.0 KiB
>>>>>>>>>>>> #80: sql/compiler.py:1019: 5023.7 KiB
>>>>>>>>>>>> #81: sql/compiler.py:1053: 4916.2 KiB
>>>>>>>>>>>> #82: sql/query.py:350: 4490.9 KiB
>>>>>>>>>>>> #83: sql/query.py:309: 4438.3 KiB
>>>>>>>>>>>> #84: sql/compiler.py:219: 4406.5 KiB
>>>>>>>>>>>> #85: sql/compiler.py:666: 4401.8 KiB
>>>>>>>>>>>> #86: python3.7/copy.py:96: 4399.3 KiB
>>>>>>>>>>>>     rv = reductor(4)
>>>>>>>>>>>> #87: models/expressions.py:159: 4398.6 KiB
>>>>>>>>>>>> #88: python3.7/copyreg.py:88: 4372.9 KiB
>>>>>>>>>>>>     return cls.__new__(cls, *args)
>>>>>>>>>>>> #89: python3.7/copy.py:274: 4372.9 KiB
>>>>>>>>>>>>     y = func(*args)
>>>>>>>>>>>> #90: python3.7/copy.py:106: 4372.8 KiB
>>>>>>>>>>>>     return _reconstruct(x, None, *rv)
>>>>>>>>>>>> #91: models/query_utils.py:82: 4223.2 KiB
>>>>>>>>>>>> #92: bin/rq:10: 4115.9 KiB
>>>>>>>>>>>> #93: python3.7/uuid.py:161: 4025.9 KiB
>>>>>>>>>>>>     int = int_(hex, 16)
>>>>>>>>>>>> #94: sql/query.py:1318: 3846.5 KiB
>>>>>>>>>>>> #95: models/expressions.py:749: 3362.8 KiB
>>>>>>>>>>>> #96: fields/__init__.py:709: 3340.3 KiB
>>>>>>>>>>>> #97: utils/tree.py:108: 3008.9 KiB
>>>>>>>>>>>> #98: models/query_utils.py:72: 2534.5 KiB
>>>>>>>>>>>> #99: models/query_utils.py:85: 2534.2 KiB
>>>>>>>>>>>> #100: models/base.py:503: 2214.3 KiB
>>>>>>>>>>>> #101: models/query.py:199: 2190.8 KiB
>>>>>>>>>>>> #102: models/query.py:190: 2178.6 KiB
>>>>>>>>>>>> #103: models/query_utils.py:74: 2164.6 KiB
>>>>>>>>>>>> #104: utils/tree.py:23: 2115.5 KiB
>>>>>>>>>>>> #105: models/query_utils.py:59: 2113.1 KiB
>>>>>>>>>>>> #106: sql/query.py:1251: 1895.8 KiB
>>>>>>>>>>>> #107: sql/query.py:1116: 1895.3 KiB
>>>>>>>>>>>> #108: models/lookups.py:19: 1892.1 KiB
>>>>>>>>>>>> #109: sql/query.py:68: 1882.4 KiB
>>>>>>>>>>>> #110: sql/query.py:1249: 1881.4 KiB
>>>>>>>>>>>> #111: fields/related.py:974: 1679.4 KiB
>>>>>>>>>>>> #112: sql/compiler.py:1018: 1672.8 KiB
>>>>>>>>>>>> #113: base/operations.py:564: 1672.8 KiB
>>>>>>>>>>>> #114: models/expressions.py:747: 1660.9 KiB
>>>>>>>>>>>> #115: models/query.py:1893: 1590.1 KiB
>>>>>>>>>>>> #116: models/query.py:63: 1590.1 KiB
>>>>>>>>>>>> #117: models/query.py:62: 1554.4 KiB
>>>>>>>>>>>> #118: models/query.py:61: 1554.4 KiB
>>>>>>>>>>>> 962 other: 9.8 MiB
>>>>>>>>>>>> Total allocated size: 99186.9 MiB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Tracemalloc for Re-sync at peak of memory consumption:*
>>>>>>>>>>>>
>>>>>>>>>>>> #1: sql/compiler.py:1512: 1813416.3 KiB
>>>>>>>>>>>> #2: asyncio/events.py:88: 1159219.2 KiB
>>>>>>>>>>>>     self._context.run(self._callback, *self._args)
>>>>>>>>>>>> #3: stages/api.py:43: 1159151.2 KiB
>>>>>>>>>>>> #4: asyncio/base_events.py:1775: 1158553.1 KiB
>>>>>>>>>>>>     if bool(enabled) == bool(self._coroutine_origin_tracking_enabled):
>>>>>>>>>>>> #5: asyncio/base_events.py:539: 1157867.4 KiB
>>>>>>>>>>>>     self._thread_id = None
>>>>>>>>>>>> #6: asyncio/base_events.py:571: 1156528.6 KiB
>>>>>>>>>>>>     # local task.
>>>>>>>>>>>> #7: stages/declarative_version.py:149: 1152654.5 KiB
>>>>>>>>>>>> #8: tasks/synchronizing.py:154: 1151586.8 KiB
>>>>>>>>>>>> #9: rq/job.py:611: 1147795.3 KiB
>>>>>>>>>>>> #10: rq/job.py:605: 1147534.7 KiB
>>>>>>>>>>>> #11: rq/worker.py:822: 1141556.0 KiB
>>>>>>>>>>>> #12: tasking/worker.py:100: 1139872.3 KiB
>>>>>>>>>>>> #13: rq/worker.py:684: 1137457.4 KiB
>>>>>>>>>>>> #14: rq/worker.py:610: 1136707.1 KiB
>>>>>>>>>>>> #15: models/query.py:274: 940312.1 KiB
>>>>>>>>>>>> #16: models/query.py:1242: 921581.8 KiB
>>>>>>>>>>>> #17: models/query.py:55: 908503.7 KiB
>>>>>>>>>>>> #18: sql/compiler.py:1133: 906747.3 KiB
>>>>>>>>>>>> #19: db/utils.py:96: 906707.8 KiB
>>>>>>>>>>>> #20: stages/content_stages.py:48: 899314.9 KiB
>>>>>>>>>>>> #21: psycopg2/_json.py:166: 872466.2 KiB
>>>>>>>>>>>> #22: json/decoder.py:337: 871817.4 KiB
>>>>>>>>>>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>>>>>>>>>> #23: json/__init__.py:348: 871817.4 KiB
>>>>>>>>>>>>     return _default_decoder.decode(s)
>>>>>>>>>>>> #24: json/decoder.py:353: 871816.8 KiB
>>>>>>>>>>>>     obj, end = self.scan_once(s, idx)
>>>>>>>>>>>> #25: click/core.py:555: 448794.2 KiB
>>>>>>>>>>>> #26: rq/worker.py:670: 259351.8 KiB
>>>>>>>>>>>> #27: tasking/worker.py:69: 257743.4 KiB
>>>>>>>>>>>> #28: rq/worker.py:477: 249243.3 KiB
>>>>>>>>>>>> #29: cli/cli.py:252: 244023.2 KiB
>>>>>>>>>>>> #30: cli/cli.py:78: 226347.5 KiB
>>>>>>>>>>>> #31: click/core.py:956: 220086.5 KiB
>>>>>>>>>>>> #32: click/core.py:1137: 213146.8 KiB
>>>>>>>>>>>> #33: click/core.py:717: 188052.0 KiB
>>>>>>>>>>>> #34: tasks/synchronizing.py:581: 177471.0 KiB
>>>>>>>>>>>> #35: models/package.py:232: 138350.9 KiB
>>>>>>>>>>>> #36: stages/artifact_stages.py:219: 54726.5 KiB
>>>>>>>>>>>> #37: stages/artifact_stages.py:251: 54723.6 KiB
>>>>>>>>>>>> #38: models/query.py:1625: 47373.9 KiB
>>>>>>>>>>>> #39: models/query.py:1738: 46489.3 KiB
>>>>>>>>>>>> #40: fields/related_descriptors.py:627: 42352.2 KiB
>>>>>>>>>>>> #41: models/package.py:238: 24673.4 KiB
>>>>>>>>>>>> #42: models/query.py:1591: 24325.2 KiB
>>>>>>>>>>>> #43: models/query.py:1244: 18744.3 KiB
>>>>>>>>>>>> #44: models/query.py:771: 18744.3 KiB
>>>>>>>>>>>> #45: models/query.py:73: 12827.6 KiB
>>>>>>>>>>>> #46: models/base.py:513: 12827.6 KiB
>>>>>>>>>>>> #47: sql/compiler.py:405: 11912.8 KiB
>>>>>>>>>>>> #48: psycopg2/extras.py:678: 11666.5 KiB
>>>>>>>>>>>> #49: models/package.py:245: 10837.2 KiB
>>>>>>>>>>>> #50: sql/compiler.py:1087: 7407.8 KiB
>>>>>>>>>>>> #51: sql/where.py:81: 6915.5 KiB
>>>>>>>>>>>> #52: models/base.py:741: 6705.2 KiB
>>>>>>>>>>>> #53: models/base.py:779: 6697.1 KiB
>>>>>>>>>>>> #54: models/progress.py:144: 6607.2 KiB
>>>>>>>>>>>> #55: click/core.py:764: 6603.1 KiB
>>>>>>>>>>>> #56: models/progress.py:191: 6601.5 KiB
>>>>>>>>>>>> #57: tasks/synchronizing.py:609: 6574.8 KiB
>>>>>>>>>>>> #58: python3.7/uuid.py:204: 6532.6 KiB
>>>>>>>>>>>>     self.__dict__['int'] = int
>>>>>>>>>>>> #59: models/base.py:851: 6485.3 KiB
>>>>>>>>>>>> #60: models/base.py:411: 6366.6 KiB
>>>>>>>>>>>> #61: models/base.py:900: 6178.2 KiB
>>>>>>>>>>>> #62: models/query.py:760: 5612.1 KiB
>>>>>>>>>>>> #63: sql/compiler.py:1429: 5528.5 KiB
>>>>>>>>>>>> #64: models/base.py:408: 5042.7 KiB
>>>>>>>>>>>> #65: tasks/synchronizing.py:575: 4724.0 KiB
>>>>>>>>>>>> #66: createrepo_c/__init__.py:312: 4722.6 KiB
>>>>>>>>>>>> #67: tasks/synchronizing.py:285: 4722.6 KiB
>>>>>>>>>>>> #68: models/lookups.py:162: 4583.4 KiB
>>>>>>>>>>>> #69: tasks/synchronizing.py:582: 4236.2 KiB
>>>>>>>>>>>> #70: tasks/synchronizing.py:256: 3655.1 KiB
>>>>>>>>>>>> #71: sql/query.py:1312: 3358.8 KiB
>>>>>>>>>>>> #72: sql/compiler.py:1417: 3312.9 KiB
>>>>>>>>>>>> #73: models/query.py:892: 3121.7 KiB
>>>>>>>>>>>> #74: models/lookups.py:153: 2934.4 KiB
>>>>>>>>>>>> #75: models/lookups.py:79: 2931.5 KiB
>>>>>>>>>>>> #76: models/expressions.py:332: 2918.8 KiB
>>>>>>>>>>>> #77: models/expressions.py:238: 2918.5 KiB
>>>>>>>>>>>> #78: fields/related_descriptors.py:219: 2766.3 KiB
>>>>>>>>>>>> #79: fields/related_descriptors.py:629: 2765.8 KiB
>>>>>>>>>>>> #80: models/query.py:910: 2575.7 KiB
>>>>>>>>>>>> #81: sql/query.py:1290: 2513.6 KiB
>>>>>>>>>>>> #82: sql/query.py:1318: 2257.2 KiB
>>>>>>>>>>>> #83: stages/association_stages.py:36: 2219.3 KiB
>>>>>>>>>>>> #84: sql/compiler.py:1452: 2214.1 KiB
>>>>>>>>>>>> #85: sql/compiler.py:1370: 2214.1 KiB
>>>>>>>>>>>> #86: python3.7/uuid.py:161: 2179.2 KiB
>>>>>>>>>>>>     int = int_(hex, 16)
>>>>>>>>>>>> #87: models/base.py:503: 2082.6 KiB
>>>>>>>>>>>> #88: models/manager.py:82: 2008.6 KiB
>>>>>>>>>>>> #89: tasks/synchronizing.py:585: 1803.4 KiB
>>>>>>>>>>>> #90: fields/related_descriptors.py:527: 1767.5 KiB
>>>>>>>>>>>> #91: python3.7/copy.py:96: 1754.9 KiB
>>>>>>>>>>>>     rv = reductor(4)
>>>>>>>>>>>> #92: urllib/parse.py:562: 1725.2 KiB
>>>>>>>>>>>>     resolved_path.append('')
>>>>>>>>>>>> #93: urllib/parse.py:489: 1725.2 KiB
>>>>>>>>>>>>     if url and url[:1] != '/': url = '/' + url
>>>>>>>>>>>> #94: urllib/parse.py:475: 1725.2 KiB
>>>>>>>>>>>>     _coerce_args(*components))
>>>>>>>>>>>> #95: models/lookups.py:159: 1647.3 KiB
>>>>>>>>>>>> #96: models/expressions.py:159: 1625.9 KiB
>>>>>>>>>>>> #97: tasks/synchronizing.py:512: 1619.8 KiB
>>>>>>>>>>>> #98: models/query.py:1652: 1619.0 KiB
>>>>>>>>>>>> #99: fields/__init__.py:801: 1604.3 KiB
>>>>>>>>>>>> #100: models/base.py:473: 1545.2 KiB
>>>>>>>>>>>> #101: python3.7/uuid.py:759: 1488.4 KiB
>>>>>>>>>>>>     def uuid4():
>>>>>>>>>>>> #102: models/package.py:248: 1455.4 KiB
>>>>>>>>>>>> #103: sql/compiler.py:489: 1332.9 KiB
>>>>>>>>>>>> #104: python3.7/copy.py:106: 1293.2 KiB
>>>>>>>>>>>>     return _reconstruct(x, None, *rv)
>>>>>>>>>>>> #105: python3.7/copyreg.py:88: 1279.1 KiB
>>>>>>>>>>>>     return cls.__new__(cls, *args)
>>>>>>>>>>>> #106: python3.7/copy.py:274: 1279.1 KiB
>>>>>>>>>>>>     y = func(*args)
>>>>>>>>>>>> #107: sql/query.py:350: 1119.1 KiB
>>>>>>>>>>>> #108: models/query.py:1231: 1057.4 KiB
>>>>>>>>>>>> #109: models/query.py:1219: 1057.4 KiB
>>>>>>>>>>>> 1019 other: 34.8 MiB
>>>>>>>>>>>> Total allocated size: 28304.4 MiB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The snapshots can be found here:
>>>>>>>>>>>>
>>>>>>>>>>>>  memory_data
>>>>>>>>>>>> <https://drive.google.com/drive/folders/1mdJtdGDvbwq_jLc5m_T8VBjjb-bDRDmS>
>>>>>>>>>>>>
>>>>>>>>>>>> *Memory consumption over time:*
>>>>>>>>>>>> [image: sync.png]
>>>>>>>>>>>>
>>>>>>>>>>>> [image: resync.png]
>>>>>>>>>>>>
>>>>>>>>>>>> [image: sync_and_resync.png]
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Fabricio Aguiar
>>>>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>>>>>>>> +55 11 999652368
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pulp-dev mailing list
>>>>>>>> Pulp-dev at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync.png
Type: image/png
Size: 20783 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resync.png
Type: image/png
Size: 18626 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.png
Type: image/png
Size: 29043 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.jpg
Type: image/jpeg
Size: 57715 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resync_centos_7.png
Type: image/png
Size: 1292232 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: initial_sync_centos_7.png
Type: image/png
Size: 1264579 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resync_centos_7_after_changes.png
Type: image/png
Size: 1169672 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: initial_sync_centos_7_after_changes.png
Type: image/png
Size: 1564064 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: remove_duplicates_pain_before.png
Type: image/png
Size: 1409388 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191207/7529ad24/attachment-0007.png>


More information about the Pulp-dev mailing list