Better repodata performance

Konstantin Ryabitsev mricon at gmail.com
Mon Jan 31 15:40:54 UTC 2005


On 31 Jan 2005 03:59:14 -0200, Alexandre Oliva <aoliva at redhat.com> wrote:
> >> Yeah.  Multiply that by a few thousand users, if you happen to run one
> >> of the mirrors...
> 
> > Well, that's not the user's problem, is it?
> 
> But why are you narrowing your focus only on users?
> 
> If users don't care about mirrors, how about letting them all go away?
> Will users still not care?

As someone who is somewhat involved in running a very large mirror, I
will tell you that it's much better to have one request for a 1MB file
than 50 requests for 10KB files, since mirrors tend to bog down on
disk IO, not on bandwidth. Sequential reads for one request for a
large file will cause less load than lots of small requests for many
files all over the disk. We have bandwidth out the wazoo: it's always
down to seek times.

> > I am not. I'm very happy that I no longer have to wait for a half an
> > hour for all .hdr files to download after I do a fresh install.
> 
> Yeah, I like that too.  But I'm not happy that I have to wait longer
> for all the updates, on aggregate, if a minor additional improvement
> could make things significantly better for everybody.  This is what
> I'm talking about.

That's not a "minor additional improvement" : that's a major
modification that will a) significantly complicate the code, and b)
make existing repositories incompatible all over again.

> Do the maths.  I did the maths not for my own set up, but for what a
> user of FC alone would face.  Is there any particular point in those
> numbers you disagree with?
> 
> Do you actually like to wait for downloads?  If you could reduce the
> download time before yum starts updating from 10 seconds to 1 second,
> wouldn't you like that?

Compared to the amount of downloading a typical update or install
involves, that makes very little difference to me. It's more or less
like "wouldn't you prefer to start off from the top of a truck when
climbing mt. Everest?"

Besides, startup times are already being addressed in HEAD, with large
improvements.

> It only speaks for the fact that you may not care about tiny amounts
> of time you waste without even realizing, and without realizing that
> they add up very quickly.

Yes. In fact, very few people seem to care. Let's look at usual
complaints about yum. Number one reason regular users bitch at yum is
because Seth stubbornly refuses to implement the
--magically-resolve-all-packaging-problems feature, next to
--use-complex-ai-to-figure-out-what-i-really-want. Then it's "yum is
too slow" and "yum takes up too much memory", both problems currently
actively worked on. You are the first person to complain about
repodata-related bandwidth, and judging from the fact that very few
people other than you, me, Seth, and Jeff have joined this
conversation, I can tell that it's a non-issue as far as most people
are concerned.

If you survey the users of Fedora systems (not developers or even
those who run rawhide, since that will be a very, very small
percentage of Fedora users) about yum, your typical responses will
probably be:

90%: Yum? What's yum? Our sysadmin handles all our computers.
9% Yeah, I think it runs nightly when I forget to turn off my machine
before going to bed, plus I run "yum update" once or twice a month
when I am bored.
0.9%: I run "yum update" daily and use it to install new software all the time.
0.1%: I use yum hourly because I am a developer or repo maintainer,
and it sucks my balls through a straw.

It's the ~1% of yum users who are the vocal ones, since it doesn't
satisfy their level of usage. The rest are either quite happy with it,
or don't even know that yum is used on their systems, and I tell you
that from my experience supporting quite a number of people running
and using Fedora. I'm not sure how much effort is justified to satisfy
those with uncommon patterns of usage.

Regards,
-- 
Konstantin Ryabitsev
Zlotniks, INC




More information about the fedora-devel-list mailing list