From sflaniga at redhat.com Tue Oct 14 08:17:24 2014 From: sflaniga at redhat.com (Sean Flanigan) Date: Tue, 14 Oct 2014 18:17:24 +1000 Subject: [zanata-devel] Cleaning up Zanata's git history Message-ID: <543CDC14.7060508@redhat.com> Hi all, Largely due to a massive binary file being committed to git some time back, our main repo [1] is now too big to push to github as a new project (other places too, probably). The offending files were later removed, but they still take up space in the repo because they are in the history. In my tests, zanata-server's compressed .git directory shrank from 153M (after gc) down to 29M when the unwanted files were removed. I would like to run BFG Repo-Cleaner [2] to remove those files from the history. I really don't like to change history, but the waste of space is pretty big, and it is causing difficulties like the one with github. Anything which requires a fresh clone is slowed down too, like configuring Jenkins jobs. We really should have done this months ago. While we're at it, I'll clean up some of the email addresses in the metadata, partly because some of them prevent github from accepting the push. (Github must be stricter than it once was.) I have added all the offending file types (pdf, jar, war) to .gitignore, but please, always check for strange files when adding new files! WHAT I PLAN TO DO: These are the commands I plan to run: git clone --mirror git at github.com:zanata/zanata-server.git cd zanata-server.git # a throwaway script which uses git-filter-branch, git-fast-export # and git-fast-import to repair some invalid author/committer # metadata (and some messy ones): git-fix-emails java -jar ~/Downloads/bfg-1.11.8.jar \ --delete-files '*.{jar,war,pdf}' \ --delete-folders 'gwt-unitCache*' git reflog expire --expire=now --all git gc --prune=now --aggressive git fsck git push --mirror WHAT YOU NEED TO DO: 1. To prepare, we all need to make sure that any work in progress has been pushed to branches in the main repo. So if you have a branch which will be part of a pull request, you should commit it and push. 2. Also, if you have any git stashes in your current repo clone, you should probably either push them as branches or turn them into patch files. Having all outstanding changes as branches in the repo will allow BFG to process these branches at the same time as the others. 3. After the cleanup, we will all need to fetch new clones from the repo. If you have changes in another fork which you weren't able to push, you should be able to rebase it, as long as you can work out which commit marks the start of your changes. Just make sure you don't try to merge the old commits with the new, or all the old history will come back too! WHEN: The plan is to do this at the beginning of a week (Brisbane time), probably next Monday or the one after. So please make sure you have pushed your branches before the weekend, for the next couple of weeks! Please let me know if you have any thoughts or objections. [1] https://github.com/zanata/zanata-server/ [2] http://rtyley.github.io/bfg-repo-cleaner/ Regards Sean. -- Sean Flanigan Senior Software Engineer Engineering - Internationalisation Red Hat From damason at redhat.com Wed Oct 15 07:38:13 2014 From: damason at redhat.com (David Mason) Date: Wed, 15 Oct 2014 03:38:13 -0400 (EDT) Subject: [zanata-devel] Cleaning up Zanata's git history In-Reply-To: <543CDC14.7060508@redhat.com> References: <543CDC14.7060508@redhat.com> Message-ID: <1529127988.39171620.1413358693934.JavaMail.zimbra@redhat.com> To avoid accidentally losing work from your local repository when the server rewrites are in place, I suggest making a backup copy of your working repositories (including the .git directory) and deleting the remotes so that it is impossible to accidentally try to merge the rewritten tree into the old tree. This would preserve the stash, allowing patches to be generated. To remove origin: git remote remove origin For older versions of git: git remote rm origin Cheers, David Mason Software Engineer L10n Engineering Red Hat, Asia-Pacific Pty Ltd Level 1, 193 North Quay Brisbane 4000 ----- Original Message ----- > From: "Sean Flanigan" > To: "zanata-devel" > Sent: Tuesday, 14 October, 2014 6:17:24 PM > Subject: [zanata-devel] Cleaning up Zanata's git history > > Hi all, > > Largely due to a massive binary file being committed to git some time > back, our main repo [1] is now too big to push to github as a new > project (other places too, probably). The offending files were later > removed, but they still take up space in the repo because they are in > the history. > > > In my tests, zanata-server's compressed .git directory shrank from 153M > (after gc) down to 29M when the unwanted files were removed. > > > I would like to run BFG Repo-Cleaner [2] to remove those files from the > history. I really don't like to change history, but the waste of space > is pretty big, and it is causing difficulties like the one with github. > Anything which requires a fresh clone is slowed down too, like > configuring Jenkins jobs. We really should have done this months ago. > > While we're at it, I'll clean up some of the email addresses in the > metadata, partly because some of them prevent github from accepting the > push. (Github must be stricter than it once was.) > > I have added all the offending file types (pdf, jar, war) to .gitignore, > but please, always check for strange files when adding new files! > > > WHAT I PLAN TO DO: > > These are the commands I plan to run: > > git clone --mirror git at github.com:zanata/zanata-server.git > cd zanata-server.git > > # a throwaway script which uses git-filter-branch, git-fast-export > # and git-fast-import to repair some invalid author/committer > # metadata (and some messy ones): > git-fix-emails > > java -jar ~/Downloads/bfg-1.11.8.jar \ > --delete-files '*.{jar,war,pdf}' \ > --delete-folders 'gwt-unitCache*' > > git reflog expire --expire=now --all > git gc --prune=now --aggressive > git fsck > git push --mirror > > > > WHAT YOU NEED TO DO: > > 1. To prepare, we all need to make sure that any work in progress has > been pushed to branches in the main repo. So if you have a branch which > will be part of a pull request, you should commit it and push. > > 2. Also, if you have any git stashes in your current repo clone, you > should probably either push them as branches or turn them into patch files. > > Having all outstanding changes as branches in the repo will allow BFG to > process these branches at the same time as the others. > > 3. After the cleanup, we will all need to fetch new clones from the repo. > > If you have changes in another fork which you weren't able to push, you > should be able to rebase it, as long as you can work out which commit > marks the start of your changes. Just make sure you don't try to merge > the old commits with the new, or all the old history will come back too! > > > WHEN: > > The plan is to do this at the beginning of a week (Brisbane time), > probably next Monday or the one after. So please make sure you have > pushed your branches before the weekend, for the next couple of weeks! > > > > Please let me know if you have any thoughts or objections. > > > > [1] https://github.com/zanata/zanata-server/ > [2] http://rtyley.github.io/bfg-repo-cleaner/ > > > Regards > > > Sean. > > -- > Sean Flanigan > > Senior Software Engineer > Engineering - Internationalisation > Red Hat > > _______________________________________________ > zanata-devel mailing list > zanata-devel at redhat.com > https://www.redhat.com/mailman/listinfo/zanata-devel >