[Pulp-list] pulp v1 vs pulp v2 rpm repo sync times

Mon Feb 25 20:13:40 UTC 2013

On 02/25/2013 12:12 PM, Bryan Kearney wrote:
> On 02/25/2013 03:10 PM, Mike McCune wrote:
>> On 02/25/2013 12:09 PM, Bryan Kearney wrote:
>>> On 02/25/2013 03:03 PM, Jay Dobies wrote:
>>>>> To give an example of a larger repo with latency, the RHEL 6Server
>>>>> x86_64 repo:
>>>>>
>>>>> Pulp V1 (i believe 4 threads):  108 minutes
>>>>> Pulp V2 with 4 threads: 192 minutes
>>>>> Pulp V2 with 1 thread:  300 minutes
>>>>>
>>>>> -Justin
>>>>
>>>> Few points...
>>>>
>>>> Looks like the analysis for 4 threads v. 1 was incorrect. We'll address
>>>> this in some capacity for 2.1.
>>>>
>>>> The simple fact is that it's just plain going to be slower than v1 for a
>>>> bit. In v1, Pulp was acting largely as a web interface to a yum repo on
>>>> disk. In v2, the paradigm has significantly changed. Grinder, however,
>>>> did not change. So what we're dealing with is grinder functioning one
>>>> way, v2 advocating a different model, and a bunch of glue in between
>>>> them.
>>>>
>>>> We've been investigating this since really the start of the year. Part
>>>> one was us measuring different download approaches in Python. We settled
>>>> on one and implemented a chunk that will handle concurrent downloads
>>>> smoothly. Part two is taking place this sprint as we start to use this
>>>> new approach to handle a yum repository instead of yum itself, which
>>>> admittedly has always been a bit of a square peg, circle hole situation.
>>>> I'm not saying the decision was wrong, but at the end of the day the new
>>>> approach should fit cleaner.
>>>>
>>>> Since you guys are set up to run these tests easily, can I ask you to
>>>> take it past 4 threads and see what you find? There has to be a place
>>>> where the increases are less significant than you saw from 1 to 4, but I
>>>> am curious just how far we should take it.
>>>>
>>>>
>>>
>>> If I do a "yum install foo" am I still hitting only the disk, and not a
>>> db? I want to make sure we have not introduced an known
>>> $PREVIOUS_PRODUCT bugs.
>>>
>>>
>>
>> AFAIK, still just disk ->  http server ->  network ->  client. That said,
>> what does this have todo with syncing remote repos?  I'm guessing you
>> are talking client installing packages ...
>>
>> Mike
> yes... I am keying off of
>
> "In v1, Pulp was acting largely as a web interface to a yum repo on
> disk. In v2, the paradigm has significantly changed."
>

ACK, makes sense why you asked now :)