[Pulp-dev] How docker type repositories are getting synced in pulp3 w.r.t concurrency ?
saydas at redhat.com
Tue Mar 30 08:06:27 UTC 2021
Thanks for your response on this one.
Since those stages use python async and asyncio this means, there will be 5
parallel downloads (as long as enough requests flow by that stage). Once an
artifact is downloaded, the next stage will transfer it to the final
storage location (may be a cloud storage), and so on.
Should I assume that, once 5 parallel download gets completed inside the
/var/lib/pulp/tmp , they will be immediately be transferred to their actual
location and then only the next batch of download will start?
This question is being raised based on our old experience with pulp 2,
where a 50+ GB openshift repo was being synced, /var/cache/pulp was of only
25 GB and during the content download part only the filesystem got filled
up and eventually, the task got canceled with disk-space error. It happened
as pulp2 used to download the data in batches of 5 but it never moved the
data to their destination until the entire repository was downloaded in
pulp cache. This was only noticed with docker\ISO\file type repos but NOT
with yum\rpm type repos.
Thanks & Regards,
*T*echnical *S*upport *E*ngineer, RHCE
Red Hat India
Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
saydas at redhat.com M: +91-7890892756 IRC: Sayan
On Tue, Mar 30, 2021 at 1:25 PM Matthias Dellweg <mdellweg at redhat.com>
> I am not quite sure, i understand the right notion of the question, but
> i'll try to give my view of it.
> Pulp 3 has a special asynchronous sync pipeline. That means on synching a
> remote repository (regardless of it's type) there is a pipeline with so
> called stages. The first stage is supposed to fetch metadata and enumerate
> content units (blobs, manifests, rpms, files, ...) and pass them into the
> pipeline. The other stages that run in parallel will each perform one of
> downloading artifacts, saving them, assemble content units, saving them,
> adding them to the new repository version.
> Since those stages use python async and asyncio this means, there will be
> 5 parallel downloads (as long as enough requests flow by that stage). Once
> an artifact is downloaded, the next stage will transfer it to the final
> storage location (may be a cloud storage), and so on. For performance
> reasons however, some stages (doing database saves) will batch their work
> into large batches (>= 100).
> In short: It's different.
> I hope this explains (high level) what's going on there.
> Feel free to ask for more detail.
> On Mon, Mar 29, 2021 at 4:48 PM Sayan Das <saydas at redhat.com> wrote:
>> Hello Everyone,
>> I am not sure if my previous email was successfully delivered or not and
>> hence I am re-sending it.
>> I hope someone will be able to help me with some clarification there.
>> Thanks & Regards,
>> Sayan das
>> *T*echnical *S*upport *E*ngineer, RHCE
>> Red Hat India
>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
>> saydas at redhat.com M: +91-7890892756 IRC: Sayan
>> On Sat, Mar 27, 2021 at 12:17 AM Sayan Das <saydas at redhat.com> wrote:
>>> Hello All,
>>> I hope this email finds you all well.
>>> My name is Sayan and I work as a support engineer for the Red Hat
>>> Satellite 6 product. During a recent interaction with my colleague Ian
>>> Ballou, we came across a pulp2-vs-pulp3 question that we are looking for
>>> clarification on and It was suggested that this pulp-dev will be a really
>>> great place to get that clarification.
>>> Please allow me to explain the pulp 2 behavior.
>>> Some parameters to consider:
>>> Repo Type: Docker or Openshift repo [ Assuming it has 200 units to get
>>> downloaded ]
>>> Download Dir: /var/cache/pulp
>>> Data Dir: /var/lib/pulp/content/units/
>>> Download concurrency: 5
>>> * Sync Started for the repo.
>>> * pulp downloaded 5 units in the "Download Dir" but never moved them
>>> in "Data Dir"
>>> * Once those first 5 units were downloaded, Pulp downloads the next 5
>>> units and the same cycle keeps on repeating untill all 200 units have been
>>> * When all 200 units are downloaded, then the entire content will be
>>> moved from "Download Dir" to the respective location inside "Data Dir"
>>> For pulp 3,
>>> Download Dir: /var/lib/pulp/tmp
>>> Data Dir: /var/lib/pulp/media
>>> Download concurrency: 5 [ I heard it's 10 but let's assume it's 5 for
>>> now ]
>>> So the question is, Will pulp 3 behave the same as pulp 2, i.e. download
>>> the entire repository inside "Download Dir" by the batches of 5 units and
>>> then move the entire repository to "Data Dir" or the behavior is different
>>> i.e. after download 5 units in "Download Dir" the content will be moved to
>>> "Data Dir" and then the next 5 units will be downloaded?
>>> Please note, I have specifically mentioned that the repo is a
>>> Docker\Openshift type repo as we are concerned about only Docker\ISO\File
>>> type repos at this moment.
>>> Any clarification that can be provided on this will be really
>>> Thanks & Regards,
>>> Sayan das
>>> *T*echnical *S*upport *E*ngineer, RHCE
>>> Red Hat India
>>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City
>>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India.
>>> saydas at redhat.com M: +91-7890892756 IRC: Sayan
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-dev