From sflaniga at redhat.com Fri Oct 28 02:12:21 2016 From: sflaniga at redhat.com (Sean Flanigan) Date: Fri, 28 Oct 2016 12:12:21 +1000 Subject: [zanata-devel] Cleaning up Zanata's git history In-Reply-To: <543CDC14.7060508@redhat.com> References: <543CDC14.7060508@redhat.com> Message-ID: So, this finally happened, only two years later than originally planned! We got sick of managing "synchronised pull requests" which needed to touch multiple zanata repositories (eg api, client and server) at once, so we have merged the main repos[1] into one. (Yes, again.) As part of the big merge, we got around to purging those old binary blobs (nothing over 350KB now), rationalised some old committer email addresses, and the result has been pushed to the new zanata-platform repository: https://github.com/zanata/zanata-platform After aggressive gc, the old server repo was 168MB, now the merged repo is about 42MB, a quarter of the size, even though it now includes the complete history of zanata-api and zanata-client too. So I think that's not bad. There might be some minor disruptions until we completely finish switching over, so please bear with us, and let us know if we seem to have broken something without noticing. Also, if anyone was about to create a pull request for one of the old repositories, and needs a hand rebasing it for the new repo zanata-platform, please get in touch, and we'll try to help. Regards Sean. PS just for reference, the migration script is here: https://gist.github.com/seanf/767c3218c1fb2e30c12d9c04d6564368 [1] parent, api, common, client, server On 14 October 2014 at 18:17, Sean Flanigan wrote: > Hi all, > > Largely due to a massive binary file being committed to git some time > back, our main repo [1] is now too big to push to github as a new > project (other places too, probably). The offending files were later > removed, but they still take up space in the repo because they are in > the history. > > > In my tests, zanata-server's compressed .git directory shrank from 153M > (after gc) down to 29M when the unwanted files were removed. > > > I would like to run BFG Repo-Cleaner [2] to remove those files from the > history. I really don't like to change history, but the waste of space > is pretty big, and it is causing difficulties like the one with github. > Anything which requires a fresh clone is slowed down too, like > configuring Jenkins jobs. We really should have done this months ago. > > While we're at it, I'll clean up some of the email addresses in the > metadata, partly because some of them prevent github from accepting the > push. (Github must be stricter than it once was.) > > I have added all the offending file types (pdf, jar, war) to .gitignore, > but please, always check for strange files when adding new files! > > > WHAT I PLAN TO DO: > > These are the commands I plan to run: > > git clone --mirror git at github.com:zanata/zanata-server.git > cd zanata-server.git > > # a throwaway script which uses git-filter-branch, git-fast-export > # and git-fast-import to repair some invalid author/committer > # metadata (and some messy ones): > git-fix-emails > > java -jar ~/Downloads/bfg-1.11.8.jar \ > --delete-files '*.{jar,war,pdf}' \ > --delete-folders 'gwt-unitCache*' > > git reflog expire --expire=now --all > git gc --prune=now --aggressive > git fsck > git push --mirror > > > > WHAT YOU NEED TO DO: > > 1. To prepare, we all need to make sure that any work in progress has > been pushed to branches in the main repo. So if you have a branch which > will be part of a pull request, you should commit it and push. > > 2. Also, if you have any git stashes in your current repo clone, you > should probably either push them as branches or turn them into patch files. > > Having all outstanding changes as branches in the repo will allow BFG to > process these branches at the same time as the others. > > 3. After the cleanup, we will all need to fetch new clones from the repo. > > If you have changes in another fork which you weren't able to push, you > should be able to rebase it, as long as you can work out which commit > marks the start of your changes. Just make sure you don't try to merge > the old commits with the new, or all the old history will come back too! > > > WHEN: > > The plan is to do this at the beginning of a week (Brisbane time), > probably next Monday or the one after. So please make sure you have > pushed your branches before the weekend, for the next couple of weeks! > > > > Please let me know if you have any thoughts or objections. > > > > [1] https://github.com/zanata/zanata-server/ > [2] http://rtyley.github.io/bfg-repo-cleaner/ > > > Regards > > > Sean. > > -- > Sean Flanigan > > Senior Software Engineer > Engineering - Internationalisation > Red Hat > -- Sean Flanigan Principal Software Engineer Globalisation Tools Engineering Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: