Matt_Domsch at dell.com
Tue Apr 28 19:24:34 UTC 2009
On Tue, Apr 28, 2009 at 12:51:18PM -0600, Stephen John Smoogen wrote:
> On Tue, Apr 28, 2009 at 12:42 PM, Matt Domsch <Matt_Domsch at dell.com> wrote:
> > A few things I'd like to change starting tomorrow (post-freeze).
> > * change the MM update-master-directory-list cronjob to start at 0 and
> > ?30 past the hour, from its current schedule of trying to start every
> > ?15 minutes. ?It is taking about 20 minutes on average to run, so
> > ?really is only running twice an hour anyhow.
> > * bump back the MM update-mirrorlist cronjob to start at :40 past the
> > ?hour. ?It takes about 20 minutes to complete, and I would like the
> > ?new content to land at the top of the hour.
> Is the 20 minutes a maximum or average? I was just wondering if
> somewhere between 35 and 40 would make sure it doesn't conflict with a
> job at the top of the hour?
pretty much maximum, though to be fair, I am not recording the start
and stop times for these events in their respective logfiles to know
> > * increase the number of crawlers, from 45 to 75. ?A full run is
> > ?taking about 3 hours now, I'd like to bring this down to under 2.
> > ?This only affects bapp1, whose load average is still under 1 and has
> > ?plenty of free RAM and CPU it seems.
> sorry for clueless question number 2. What is the limiting factors for
> the crawlers? Network bandwidth/latency or CPU?
More latency than bandwidth. The crawlers issue HTTP HEAD requests
for a lot of files on each mirror to be sure they match. The latency
in response to these requests (single-threaded to each mirror, but
hitting 45 (or soon more) mirrors in parallel) is what limits the
speed of an individual crawler. Then the time for the whole run is
simply the time it takes to complete each of the crawlers.
Very little CPU is used, except at the end of each crawler, when it
updates the database with its findings. Then it jumps up in CPU for a
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
More information about the Fedora-infrastructure-list