Fedora extras metadata

Michael Schwendt mschwendt.tmp0701.nospam at arcor.de
Fri Mar 16 09:38:56 UTC 2007


On Fri, 16 Mar 2007 08:17:47 +0100, Thomas M Steenholdt wrote:

> Michael Schwendt wrote:
> >>
> >> What is going on with fedora extras metadata lately?
> > 
> > Nothing. It's just the mirrors that choke on daily updates and don't sync
> > safely and frequently enough.
> > 
> This seems to be happening more often that we could hope for.

As mentioned in the thread "repoview in our repositories", I believe
repoview may be one culprit, since it's located *inside* the repodata
directory and potentially increases the time a mirror spends in that tree.

Repoview was put in place at the beginning of Fedora Extras and has been
kept in pretty much a "nobody cares about it" state. Meanwhile it results
in thousands of html files. More than 16,000 pages per dist release! And
until recently several thousands more for the debuginfo repos. Too many of
the pages change when we publish updated packages. It's not 1:1, but 1:N
(one updated package => many updated pages), see the other thread for more
details. And this is the size of repoview for extras devel:

24M     ./SRPMS/repodata/repoview
39M     ./i386/repodata/repoview
38M     ./ppc/repoview
41M     ./x86_64/repodata/repoview

> Is there a documented way to set up mirroring, to ENSURE that the 
> mirrors are in a consistent state?

Not that I know of, as we don't do anything like clearing and setting
a flag file before and after the sync.

So, theoretically it can happen that a mirror is downloading while we copy
new packages and new metadata to Red Hat. And if that happens while a
mirror is choking on thousands of repoview pages inside the repodata dir,
this increases the window during which downloads can become inconsistent.

Mirrors that don't sync daily are hit hard apparently:

  development/i386/repodata/
  mirrors.kernel.org  2007-03-11 11:44 repomd.xml

> If not - and I believe this has been brought up earlier (by myself). I 
> really think we could do with a simple timestamping mirror handshake 
> mechanism, kinda like what debian does. This would allow mirrors to 
> monitor for a special file and when that file is available, we know the 
> mirror is in a consistent state (as consistent as it's master - which 
> can also be tracked in this way). Mirrors could easily add a few lines 
> to their scripts to honor this kind of thing, without the need for 
> special mirroring tools.
> 
> Just a suggestion
> 
> /Thomas

What does the algorithm look like?

"Handshake" implies bidirectional communication, which is not
available. We only have "slave retrieves from master" or "slave retrieves
from slave" relationships. Mutual exclusion is not possible either. Flag
files don't make the mirroring an atomic operation. The master can still
break the scheme and push updates while a mirror is downloading in believe
that the previously checked flag file was up-to-date.




More information about the fedora-devel-list mailing list