[Pulp-dev] How docker type repositories are getting synced in pulp3 w.r.t concurrency ?

Ina Panova ipanova at redhat.com
Tue Mar 30 10:03:16 UTC 2021


We need to look into fixing this bug https://pulp.plan.io/issues/8295 to
match the behaviour you have described Matthias.


--------
Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Tue, Mar 30, 2021 at 10:12 AM Matthias Dellweg <mdellweg at redhat.com>
wrote:

>
>
> On Tue, Mar 30, 2021 at 10:06 AM Sayan Das <saydas at redhat.com> wrote:
>
>> Hello Matthias,
>>
>> Thanks for your response on this one.
>>
>> By this,
>> ~~
>> Since those stages use python async and asyncio this means, there will be
>> 5 parallel downloads (as long as enough requests flow by that stage). Once
>> an artifact is downloaded, the next stage will transfer it to the final
>> storage location (may be a cloud storage), and so on.
>> ~~
>>
>> Should I assume that, once 5 parallel download gets completed inside the
>> /var/lib/pulp/tmp , they will be immediately be transferred to their actual
>> location and then only the next batch of download will start?
>>
> As far as i know, the downloads are not batched at all. If one completes,
> the next one can start, so it's always 5 in parallel. And then if it's
> finished, it will be transferred to the storage by one of the following
> stages. However pulp does not look at the disk size. So in theory, you
> should be safe, but there's no guarantee.
>
>>
>> This question is being raised based on our old experience with pulp 2,
>> where a 50+ GB openshift repo was being synced, /var/cache/pulp was of only
>> 25 GB and during the content download part only the filesystem got filled
>> up and eventually, the task got canceled with disk-space error. It happened
>> as pulp2 used to download the data in batches of 5 but it never moved the
>> data to their destination until the entire repository was downloaded in
>> pulp cache. This was only noticed with docker\ISO\file type repos but NOT
>> with yum\rpm type repos.
>>
>>
>>
>> Thanks & Regards,
>>
>> Sayan das
>>
>> *T*echnical *S*upport *E*ngineer, RHCE
>>
>> Red Hat India
>> <https://www.redhat.com/>
>>
>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
>>
>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
>>
>> saydas at redhat.com    M: +91-7890892756     IRC: Sayan
>> <https://red.ht/sig>
>>
>>
>> On Tue, Mar 30, 2021 at 1:25 PM Matthias Dellweg <mdellweg at redhat.com>
>> wrote:
>>
>>> I am not quite sure, i understand the right notion of the question, but
>>> i'll try to give my view of it.
>>> Pulp 3 has a special asynchronous sync pipeline. That means on synching
>>> a remote repository (regardless of it's type) there is a pipeline with so
>>> called stages. The first stage is supposed to fetch metadata and enumerate
>>> content units (blobs, manifests, rpms, files, ...) and pass them into the
>>> pipeline. The other stages that run in parallel will each perform one of
>>> downloading artifacts, saving them, assemble content units, saving them,
>>> adding them to the new repository version.
>>> Since those stages use python async and asyncio this means, there will
>>> be 5 parallel downloads (as long as enough requests flow by that stage).
>>> Once an artifact is downloaded, the next stage will transfer it to the
>>> final storage location (may be a cloud storage), and so on. For performance
>>> reasons however, some stages (doing database saves) will batch their work
>>> into large batches (>= 100).
>>> In short: It's different.
>>> I hope this explains (high level) what's going on there.
>>> Feel free to ask for more detail.
>>>
>>> On Mon, Mar 29, 2021 at 4:48 PM Sayan Das <saydas at redhat.com> wrote:
>>>
>>>> Hello Everyone,
>>>>
>>>> I am not sure if my previous email was successfully delivered or not
>>>> and hence I am re-sending it.
>>>>
>>>> I hope someone will be able to help me with some clarification there.
>>>>
>>>>
>>>> Thanks & Regards,
>>>>
>>>> Sayan das
>>>>
>>>> *T*echnical *S*upport *E*ngineer, RHCE
>>>>
>>>> Red Hat India
>>>> <https://www.redhat.com/>
>>>>
>>>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
>>>>
>>>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
>>>>
>>>> saydas at redhat.com    M: +91-7890892756     IRC: Sayan
>>>> <https://red.ht/sig>
>>>>
>>>>
>>>> On Sat, Mar 27, 2021 at 12:17 AM Sayan Das <saydas at redhat.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I hope this email finds you all well.
>>>>>
>>>>> My name is Sayan and I work as a support engineer for the Red Hat
>>>>> Satellite 6 product. During a recent interaction with my colleague Ian
>>>>> Ballou, we came across a pulp2-vs-pulp3 question that we are looking for
>>>>> clarification on and It was suggested that this pulp-dev will be a really
>>>>> great place to get that clarification.
>>>>>
>>>>> Please allow me to explain the pulp 2 behavior.
>>>>>
>>>>> Some parameters to consider:
>>>>>
>>>>> Repo Type: Docker or Openshift repo [ Assuming it has 200 units to get
>>>>> downloaded ]
>>>>> Download Dir: /var/cache/pulp
>>>>> Data Dir: /var/lib/pulp/content/units/
>>>>> Download concurrency: 5
>>>>>
>>>>> Now,
>>>>>    * Sync Started for the repo.
>>>>>    * pulp downloaded 5 units in the "Download Dir" but never moved
>>>>> them in "Data Dir"
>>>>>    * Once those first 5 units were downloaded, Pulp downloads the next
>>>>> 5 units and the same cycle keeps on repeating untill all 200 units have
>>>>> been downloaded.
>>>>>    * When all 200 units are downloaded, then the entire content will
>>>>> be moved from "Download Dir" to the respective location inside "Data Dir"
>>>>>
>>>>>
>>>>> For pulp 3,
>>>>>
>>>>> Download Dir: /var/lib/pulp/tmp
>>>>> Data Dir: /var/lib/pulp/media
>>>>> Download concurrency: 5 [ I heard it's 10 but let's assume it's 5 for
>>>>> now ]
>>>>>
>>>>>
>>>>> So the question is, Will pulp 3 behave the same as pulp 2, i.e.
>>>>> download the entire repository inside "Download Dir" by the batches of 5
>>>>> units and then move the entire repository to "Data Dir" or the behavior is
>>>>> different i.e. after download 5 units in "Download Dir" the content will be
>>>>> moved to "Data Dir" and then the next 5 units will be downloaded?
>>>>>
>>>>> Please note, I have specifically mentioned that the repo is a
>>>>> Docker\Openshift type repo as we are concerned about only Docker\ISO\File
>>>>> type repos at this moment.
>>>>>
>>>>> Any clarification that can be provided on this will be really
>>>>> appreciated.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks & Regards,
>>>>>
>>>>> Sayan das
>>>>>
>>>>> *T*echnical *S*upport *E*ngineer, RHCE
>>>>>
>>>>> Red Hat India
>>>>> <https://www.redhat.com/>
>>>>>
>>>>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
>>>>>
>>>>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
>>>>>
>>>>> saydas at redhat.com    M: +91-7890892756     IRC: Sayan
>>>>> <https://red.ht/sig>
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://listman.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://listman.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20210330/e66b594a/attachment.htm>


More information about the Pulp-dev mailing list