[Pulp-dev] Memory consumption on RPM sync

Daniel Alley dalley at redhat.com
Tue Nov 26 20:06:38 UTC 2019


JSON decoding was also a significant slice of time, so, possibly the
problem is how JSON blobs behave with these big queries.  And of course
Erratas have tons of those.

On Tue, Nov 26, 2019 at 1:50 PM Fabricio Aguiar <fabricio.aguiar at redhat.com>
wrote:

> Thanks Dennis and Daniel!
>
> those are really good points Daniel!
>
> I noticed sync and re-sync behaves almost the same for a period of time:
> [image: sync_and_resync.jpg]
>
> so I was guessing the problem would be when starting to save the contents.
> Combining Daniel and Dennis hints, probably the problem is a very long
> query with many Erratas!
>
> Best regards,
> Fabricio Aguiar
> Software Engineer, Pulp Project
> Red Hat Brazil - Latam <https://www.redhat.com/>
> +55 11 999652368
>
>
> On Tue, Nov 26, 2019 at 3:26 PM Daniel Alley <dalley at redhat.com> wrote:
>
>> Fabricio, this is great work!
>>
>> One thing that stands out is that a very large amount of time is being
>> spent in remove_duplicates(), 65% of the total runtime of the sync.
>>
>>    - 13% of the total runtime spent on this inside cast()
>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L31
>>    - 31% of the total runtime spent in these lines
>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L37-L38
>>    - 20% of the total runtime spent on this line
>>    https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/repo_version_utils.py#L42
>>
>> There's a couple of suggestions on how to improve this code written up
>> here https://pulp.plan.io/issues/5701
>>
>> The memory usage seems to be coming from the Django ORM backend.  My
>> guess is that lines 37 & 38 (linked above) are generating some extremely
>> long SQL, because it's basically doing OR with a vast number of individual
>> queries. And I think that is also not great for performance.
>> https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
>>
>> Another hint that that might be the case is that the memory usage grows
>> and grows and then collapses all at once.  That would make sense if the SQL
>> compiler's data structure is growing and growing and then being destroyed
>> after the query executes.
>>
>> I would imagine that if we fix our queries, we'll both fix the memory
>> consumption and improve performance at the same time.
>>
>> On Tue, Nov 26, 2019 at 11:57 AM Fabricio Aguiar <
>> fabricio.aguiar at redhat.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I've been investigating memory consumption for syncing on RPM plugin - issue
>>> 5688 <https://pulp.plan.io/issues/5688>, and I want to share what I've
>>> found so far.
>>>
>>> First, it seems that a new process is created when we started to sync,
>>> and this new process is responsible for the large amount of memory
>>> consumption observed.
>>>
>>> As I shared on some notes here: https://pulp.plan.io/issues/5688
>>> The problem seems to affect only RHEL, I did not see a large consumption
>>> of memory for the following repos:
>>> - CentOS 6: http://mirror.centos.org/centos-6/6.10/os/x86_64/
>>> - CentOS 7: http://mirror.linux.duke.edu/pub/centos/7/os/x86_64/
>>> - CentOS 8 AppStrem:
>>> http://mirror.linux.duke.edu/pub/centos/8/AppStream/x86_64/os/
>>> - CentOS 8 BaseOS:
>>> http://mirror.linux.duke.edu/pub/centos/8/BaseOS/x86_64/os/
>>> - EPEL 7: https://dl.fedoraproject.org/pub/epel/7/x86_64/
>>>
>>> As stated on the issue, with RHEL 7 we can observe the problem, for my
>>> tests, I used the following repo:
>>> http://cdn.stage.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/
>>>
>>> For my tests, I used pulp3-source-fedora30 vagrant box, initially set
>>> with 4GB, then I increased to 16GB, and it was not enough. I only was able
>>> to sync RHEL 7 when I increased the memory up to 23GB.
>>>
>>> Utilizing py-spy I got those SVG profiles attached, you can find them
>>> hosted here:
>>> Sync - https://sendeyo.com/up/d/90a8ae4c8f
>>> Re-sync - https://sendeyo.com/up/d/4c855bcce3
>>>
>>> With the following branch, I was able to take some tracemalloc
>>> <https://docs.python.org/3/library/tracemalloc.html> snapshots (every
>>> 30 seconds):
>>>
>>> https://github.com/pulp/pulp_rpm/commit/f3f079010cfe81b7f5cf3ef94c2402b1ccf7d90c
>>>
>>> *Tracemalloc for Sync at peak of memory consumption:*
>>>
>>> #1: sql/compiler.py:1512: 7825340.4 KiB
>>> #2: tasks/synchronizing.py:154: 4062651.9 KiB
>>> #3: stages/declarative_version.py:149: 4062601.9 KiB
>>> #4: models/repository.py:631: 4062130.0 KiB
>>> #5: models/repository.py:96: 4062127.7 KiB
>>> #6: rq/job.py:611: 4060459.2 KiB
>>> #7: rq/job.py:605: 4058273.8 KiB
>>> #8: rq/worker.py:822: 4053883.3 KiB
>>> #9: tasking/worker.py:100: 4053875.9 KiB
>>> #10: rq/worker.py:684: 4053849.9 KiB
>>> #11: rq/worker.py:610: 4051645.7 KiB
>>> #12: plugin/repo_version_utils.py:31: 4028192.9 KiB
>>> #13: models/base.py:124: 4028191.2 KiB
>>> #14: fields/related_descriptors.py:401: 4028165.2 KiB
>>> #15: models/query.py:1242: 4021212.9 KiB
>>> #16: models/query.py:402: 3995284.5 KiB
>>> #17: models/query.py:256: 3995284.5 KiB
>>> #18: models/query.py:55: 3952718.2 KiB
>>> #19: sql/compiler.py:1133: 3919417.7 KiB
>>> #20: db/utils.py:96: 3912674.4 KiB
>>> #21: psycopg2/_json.py:166: 3837534.7 KiB
>>> #22: json/decoder.py:337: 3837533.2 KiB
>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>> #23: json/__init__.py:348: 3837533.2 KiB
>>>     return _default_decoder.decode(s)
>>> #24: json/decoder.py:353: 3837532.7 KiB
>>>     obj, end = self.scan_once(s, idx)
>>> #25: rq/worker.py:670: 211891.5 KiB
>>> #26: tasking/worker.py:69: 198267.0 KiB
>>> #27: rq/worker.py:477: 191261.5 KiB
>>> #28: cli/cli.py:252: 186110.9 KiB
>>> #29: click/core.py:555: 185957.8 KiB
>>> #30: cli/cli.py:78: 104822.7 KiB
>>> #31: models/query.py:73: 48350.9 KiB
>>> #32: models/base.py:513: 48350.9 KiB
>>> #33: sql/compiler.py:405: 38126.1 KiB
>>> #34: sql/compiler.py:1087: 33355.6 KiB
>>> #35: click/core.py:956: 29580.3 KiB
>>> #36: click/core.py:1137: 29498.3 KiB
>>> #37: models/base.py:430: 29340.9 KiB
>>> #38: models/query.py:274: 25946.8 KiB
>>> #39: plugin/repo_version_utils.py:30: 25850.4 KiB
>>> #40: models/query.py:892: 24105.6 KiB
>>> #41: models/query.py:910: 24089.7 KiB
>>> #42: models/query.py:399: 24045.0 KiB
>>> #43: sql/query.py:1290: 17343.9 KiB
>>> #44: sql/where.py:81: 15629.7 KiB
>>> #45: sql/compiler.py:489: 15581.6 KiB
>>> #46: models/lookups.py:162: 15527.0 KiB
>>> #47: psycopg2/extras.py:678: 15027.3 KiB
>>> #48: sql/query.py:1323: 13486.4 KiB
>>> #49: sql/query.py:796: 13486.3 KiB
>>> #50: sql/compiler.py:474: 11291.3 KiB
>>> #51: sql/compiler.py:54: 11282.7 KiB
>>> #52: sql/compiler.py:45: 11282.7 KiB
>>> #53: click/core.py:717: 10190.6 KiB
>>> #54: models/query.py:72: 9967.2 KiB
>>> #55: models/query.py:1219: 8855.6 KiB
>>> #56: models/query.py:1231: 8855.6 KiB
>>> #57: models/query.py:1072: 8840.0 KiB
>>> #58: models/query.py:401: 8840.0 KiB
>>> #59: models/lookups.py:153: 8779.2 KiB
>>> #60: models/lookups.py:79: 8773.0 KiB
>>> #61: models/expressions.py:332: 8769.2 KiB
>>> #62: models/expressions.py:238: 8769.2 KiB
>>> #63: plugin/repo_version_utils.py:37: 7602.2 KiB
>>> #64: models/base.py:408: 7378.9 KiB
>>> #65: click/core.py:764: 7371.2 KiB
>>> #66: sql/compiler.py:254: 6853.7 KiB
>>> #67: models/expressions.py:737: 6851.8 KiB
>>> #68: models/lookups.py:159: 6745.8 KiB
>>> #69: sql/query.py:1293: 6744.2 KiB
>>> #70: sql/query.py:2258: 6743.6 KiB
>>> #71: sql/query.py:763: 6743.6 KiB
>>> #72: sql/query.py:2259: 6742.7 KiB
>>> #73: sql/compiler.py:1054: 6696.5 KiB
>>> #74: utils/deconstruct.py:17: 6261.2 KiB
>>> #75: fields/related.py:986: 6246.5 KiB
>>> #76: fields/__init__.py:381: 6245.8 KiB
>>> #77: models/base.py:411: 5990.2 KiB
>>> #78: python3.7/uuid.py:204: 5876.0 KiB
>>>     self.__dict__['int'] = int
>>> #79: sql/compiler.py:472: 5784.0 KiB
>>> #80: sql/compiler.py:1019: 5023.7 KiB
>>> #81: sql/compiler.py:1053: 4916.2 KiB
>>> #82: sql/query.py:350: 4490.9 KiB
>>> #83: sql/query.py:309: 4438.3 KiB
>>> #84: sql/compiler.py:219: 4406.5 KiB
>>> #85: sql/compiler.py:666: 4401.8 KiB
>>> #86: python3.7/copy.py:96: 4399.3 KiB
>>>     rv = reductor(4)
>>> #87: models/expressions.py:159: 4398.6 KiB
>>> #88: python3.7/copyreg.py:88: 4372.9 KiB
>>>     return cls.__new__(cls, *args)
>>> #89: python3.7/copy.py:274: 4372.9 KiB
>>>     y = func(*args)
>>> #90: python3.7/copy.py:106: 4372.8 KiB
>>>     return _reconstruct(x, None, *rv)
>>> #91: models/query_utils.py:82: 4223.2 KiB
>>> #92: bin/rq:10: 4115.9 KiB
>>> #93: python3.7/uuid.py:161: 4025.9 KiB
>>>     int = int_(hex, 16)
>>> #94: sql/query.py:1318: 3846.5 KiB
>>> #95: models/expressions.py:749: 3362.8 KiB
>>> #96: fields/__init__.py:709: 3340.3 KiB
>>> #97: utils/tree.py:108: 3008.9 KiB
>>> #98: models/query_utils.py:72: 2534.5 KiB
>>> #99: models/query_utils.py:85: 2534.2 KiB
>>> #100: models/base.py:503: 2214.3 KiB
>>> #101: models/query.py:199: 2190.8 KiB
>>> #102: models/query.py:190: 2178.6 KiB
>>> #103: models/query_utils.py:74: 2164.6 KiB
>>> #104: utils/tree.py:23: 2115.5 KiB
>>> #105: models/query_utils.py:59: 2113.1 KiB
>>> #106: sql/query.py:1251: 1895.8 KiB
>>> #107: sql/query.py:1116: 1895.3 KiB
>>> #108: models/lookups.py:19: 1892.1 KiB
>>> #109: sql/query.py:68: 1882.4 KiB
>>> #110: sql/query.py:1249: 1881.4 KiB
>>> #111: fields/related.py:974: 1679.4 KiB
>>> #112: sql/compiler.py:1018: 1672.8 KiB
>>> #113: base/operations.py:564: 1672.8 KiB
>>> #114: models/expressions.py:747: 1660.9 KiB
>>> #115: models/query.py:1893: 1590.1 KiB
>>> #116: models/query.py:63: 1590.1 KiB
>>> #117: models/query.py:62: 1554.4 KiB
>>> #118: models/query.py:61: 1554.4 KiB
>>> 962 other: 9.8 MiB
>>> Total allocated size: 99186.9 MiB
>>>
>>>
>>>
>>> *Tracemalloc for Re-sync at peak of memory consumption:*
>>>
>>> #1: sql/compiler.py:1512: 1813416.3 KiB
>>> #2: asyncio/events.py:88: 1159219.2 KiB
>>>     self._context.run(self._callback, *self._args)
>>> #3: stages/api.py:43: 1159151.2 KiB
>>> #4: asyncio/base_events.py:1775: 1158553.1 KiB
>>>     if bool(enabled) == bool(self._coroutine_origin_tracking_enabled):
>>> #5: asyncio/base_events.py:539: 1157867.4 KiB
>>>     self._thread_id = None
>>> #6: asyncio/base_events.py:571: 1156528.6 KiB
>>>     # local task.
>>> #7: stages/declarative_version.py:149: 1152654.5 KiB
>>> #8: tasks/synchronizing.py:154: 1151586.8 KiB
>>> #9: rq/job.py:611: 1147795.3 KiB
>>> #10: rq/job.py:605: 1147534.7 KiB
>>> #11: rq/worker.py:822: 1141556.0 KiB
>>> #12: tasking/worker.py:100: 1139872.3 KiB
>>> #13: rq/worker.py:684: 1137457.4 KiB
>>> #14: rq/worker.py:610: 1136707.1 KiB
>>> #15: models/query.py:274: 940312.1 KiB
>>> #16: models/query.py:1242: 921581.8 KiB
>>> #17: models/query.py:55: 908503.7 KiB
>>> #18: sql/compiler.py:1133: 906747.3 KiB
>>> #19: db/utils.py:96: 906707.8 KiB
>>> #20: stages/content_stages.py:48: 899314.9 KiB
>>> #21: psycopg2/_json.py:166: 872466.2 KiB
>>> #22: json/decoder.py:337: 871817.4 KiB
>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>> #23: json/__init__.py:348: 871817.4 KiB
>>>     return _default_decoder.decode(s)
>>> #24: json/decoder.py:353: 871816.8 KiB
>>>     obj, end = self.scan_once(s, idx)
>>> #25: click/core.py:555: 448794.2 KiB
>>> #26: rq/worker.py:670: 259351.8 KiB
>>> #27: tasking/worker.py:69: 257743.4 KiB
>>> #28: rq/worker.py:477: 249243.3 KiB
>>> #29: cli/cli.py:252: 244023.2 KiB
>>> #30: cli/cli.py:78: 226347.5 KiB
>>> #31: click/core.py:956: 220086.5 KiB
>>> #32: click/core.py:1137: 213146.8 KiB
>>> #33: click/core.py:717: 188052.0 KiB
>>> #34: tasks/synchronizing.py:581: 177471.0 KiB
>>> #35: models/package.py:232: 138350.9 KiB
>>> #36: stages/artifact_stages.py:219: 54726.5 KiB
>>> #37: stages/artifact_stages.py:251: 54723.6 KiB
>>> #38: models/query.py:1625: 47373.9 KiB
>>> #39: models/query.py:1738: 46489.3 KiB
>>> #40: fields/related_descriptors.py:627: 42352.2 KiB
>>> #41: models/package.py:238: 24673.4 KiB
>>> #42: models/query.py:1591: 24325.2 KiB
>>> #43: models/query.py:1244: 18744.3 KiB
>>> #44: models/query.py:771: 18744.3 KiB
>>> #45: models/query.py:73: 12827.6 KiB
>>> #46: models/base.py:513: 12827.6 KiB
>>> #47: sql/compiler.py:405: 11912.8 KiB
>>> #48: psycopg2/extras.py:678: 11666.5 KiB
>>> #49: models/package.py:245: 10837.2 KiB
>>> #50: sql/compiler.py:1087: 7407.8 KiB
>>> #51: sql/where.py:81: 6915.5 KiB
>>> #52: models/base.py:741: 6705.2 KiB
>>> #53: models/base.py:779: 6697.1 KiB
>>> #54: models/progress.py:144: 6607.2 KiB
>>> #55: click/core.py:764: 6603.1 KiB
>>> #56: models/progress.py:191: 6601.5 KiB
>>> #57: tasks/synchronizing.py:609: 6574.8 KiB
>>> #58: python3.7/uuid.py:204: 6532.6 KiB
>>>     self.__dict__['int'] = int
>>> #59: models/base.py:851: 6485.3 KiB
>>> #60: models/base.py:411: 6366.6 KiB
>>> #61: models/base.py:900: 6178.2 KiB
>>> #62: models/query.py:760: 5612.1 KiB
>>> #63: sql/compiler.py:1429: 5528.5 KiB
>>> #64: models/base.py:408: 5042.7 KiB
>>> #65: tasks/synchronizing.py:575: 4724.0 KiB
>>> #66: createrepo_c/__init__.py:312: 4722.6 KiB
>>> #67: tasks/synchronizing.py:285: 4722.6 KiB
>>> #68: models/lookups.py:162: 4583.4 KiB
>>> #69: tasks/synchronizing.py:582: 4236.2 KiB
>>> #70: tasks/synchronizing.py:256: 3655.1 KiB
>>> #71: sql/query.py:1312: 3358.8 KiB
>>> #72: sql/compiler.py:1417: 3312.9 KiB
>>> #73: models/query.py:892: 3121.7 KiB
>>> #74: models/lookups.py:153: 2934.4 KiB
>>> #75: models/lookups.py:79: 2931.5 KiB
>>> #76: models/expressions.py:332: 2918.8 KiB
>>> #77: models/expressions.py:238: 2918.5 KiB
>>> #78: fields/related_descriptors.py:219: 2766.3 KiB
>>> #79: fields/related_descriptors.py:629: 2765.8 KiB
>>> #80: models/query.py:910: 2575.7 KiB
>>> #81: sql/query.py:1290: 2513.6 KiB
>>> #82: sql/query.py:1318: 2257.2 KiB
>>> #83: stages/association_stages.py:36: 2219.3 KiB
>>> #84: sql/compiler.py:1452: 2214.1 KiB
>>> #85: sql/compiler.py:1370: 2214.1 KiB
>>> #86: python3.7/uuid.py:161: 2179.2 KiB
>>>     int = int_(hex, 16)
>>> #87: models/base.py:503: 2082.6 KiB
>>> #88: models/manager.py:82: 2008.6 KiB
>>> #89: tasks/synchronizing.py:585: 1803.4 KiB
>>> #90: fields/related_descriptors.py:527: 1767.5 KiB
>>> #91: python3.7/copy.py:96: 1754.9 KiB
>>>     rv = reductor(4)
>>> #92: urllib/parse.py:562: 1725.2 KiB
>>>     resolved_path.append('')
>>> #93: urllib/parse.py:489: 1725.2 KiB
>>>     if url and url[:1] != '/': url = '/' + url
>>> #94: urllib/parse.py:475: 1725.2 KiB
>>>     _coerce_args(*components))
>>> #95: models/lookups.py:159: 1647.3 KiB
>>> #96: models/expressions.py:159: 1625.9 KiB
>>> #97: tasks/synchronizing.py:512: 1619.8 KiB
>>> #98: models/query.py:1652: 1619.0 KiB
>>> #99: fields/__init__.py:801: 1604.3 KiB
>>> #100: models/base.py:473: 1545.2 KiB
>>> #101: python3.7/uuid.py:759: 1488.4 KiB
>>>     def uuid4():
>>> #102: models/package.py:248: 1455.4 KiB
>>> #103: sql/compiler.py:489: 1332.9 KiB
>>> #104: python3.7/copy.py:106: 1293.2 KiB
>>>     return _reconstruct(x, None, *rv)
>>> #105: python3.7/copyreg.py:88: 1279.1 KiB
>>>     return cls.__new__(cls, *args)
>>> #106: python3.7/copy.py:274: 1279.1 KiB
>>>     y = func(*args)
>>> #107: sql/query.py:350: 1119.1 KiB
>>> #108: models/query.py:1231: 1057.4 KiB
>>> #109: models/query.py:1219: 1057.4 KiB
>>> 1019 other: 34.8 MiB
>>> Total allocated size: 28304.4 MiB
>>>
>>>
>>>
>>> The snapshots can be found here:
>>>
>>>  memory_data
>>> <https://drive.google.com/drive/folders/1mdJtdGDvbwq_jLc5m_T8VBjjb-bDRDmS>
>>>
>>> *Memory consumption over time:*
>>> [image: sync.png]
>>>
>>> [image: resync.png]
>>>
>>> [image: sync_and_resync.png]
>>>
>>> Best regards,
>>> Fabricio Aguiar
>>> Software Engineer, Pulp Project
>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>> +55 11 999652368
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191126/2b2ef9ba/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync.png
Type: image/png
Size: 20783 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191126/2b2ef9ba/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resync.png
Type: image/png
Size: 18626 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191126/2b2ef9ba/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.png
Type: image/png
Size: 29043 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191126/2b2ef9ba/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync_and_resync.jpg
Type: image/jpeg
Size: 57715 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191126/2b2ef9ba/attachment.jpg>


More information about the Pulp-dev mailing list