fedorahosted git repo too large

Nigel Jones dev at nigelj.com
Wed Aug 6 10:53:32 UTC 2008


On Wed, 2008-08-06 at 11:23 +0200, Jeroen van Meeuwen wrote:
> Nigel Jones wrote:
> > On Tue, 2008-08-05 at 23:44 -0400, Todd Zullinger wrote:
> >> Yuan Yijun wrote:
> >>> I just tried to download revisor git with this command "git pull
> >>> http://git.fedorahosted.org/git/revisor master". I have to repeat
> >>> 4-5 times since it breaks during downloading. The .git folder is
> >>> about 58MB. After "git gc --aggressive" it becomes only 6MB.
> >>>
> >>> Anyone please run gc on server?
> >> Perhaps better would be repack.  There was a recent thread on the git
> >> list and one of the developers pointed out an older mail from Linus
> >> where he described gc --aggressive as "mostly dumb" and recommended
> >> that using something like "repack -a -d -f --depth=250 --window=250"
> >> instead.
> >>
> >> http://article.gmane.org/gmane.comp.gcc.devel/94613
> > That's actually a very useful article and the methods/reasons behind it
> > sound quite sane and it could be a useful approach for us.
> > 
> > I'll try this out on one of the smaller repos (a copy of course) and see
> > what happens.
> > 
> 
> We've ended up doing this live as well and I'm happy with the few stabs 
> I took at seeing if everything still works.
> 
> Feel free to make this a regular thing on the revisor repo and I'll 
> report if anything breaks, so that if it doesn't, this could maybe 
> become a regular thing to do on all repos?
Okay, from a server POV it shrunk the 116MB folder down to just 7MB in
less than two minutes (based on a trial run in my homedir), which is
pretty sweet.

A trial with system-config-firewall.git went from ~20M to ~4M.

I also did a trial run of anaconda.git and anaconda-images.git:
anaconda.git:
183M (97745 objects) -> 64M (a third of the original size)
real    26m18.050s
user    23m9.395s
sys     0m6.568s

anaconda-images.git:
54M (1482 objects) -> 41M (didn't expect much here)
real    1m57.944s
user    1m43.466s
sys     0m0.848s

Maybe we should run git repack on the big repos on a bi/tri-monthly
basis, and git gc (which is very fast - <1 minute on the anaconda repo
for example) on a monthly basis.

- Nigel




More information about the Fedora-infrastructure-list mailing list