[Kde-scm-interest] Re: git filter-branch preserving history

Arno Rehn kde at arnorehn.de
Fri Nov 19 21:31:35 CET 2010


On Friday 19 November 2010 21:04:15 Boyd Stephen Smith Jr. wrote:
> In <201011192001.20255.kde at arnorehn.de>, Arno Rehn wrote:
> >we're currently experimenting how to best convert kdebindings to git. One
> >thing that we tried was 'git filter-branch --subdirectory-filter ruby --
> >-- all' to remove everything but the 'ruby' subdir and make this the
> >top-level directory. The resulting directory structure looked good, but
> >strangely the history of all other unrelated subdirs was left intact,
> >even after git gc -- aggressive and git prune.
> >So the resultung repository wasn't any smaller than the original
> >monolithic kdebindings thing, even though there was only a small subdir
> >left.
> 
> You probably just didn't try hard enough to lose the objects.  The "--
> aggressive" option to gc doesn't shrink the pruning window, it "simply"
> adjusts the packing done.  You didn't mention if you removed the backup
> refs created by filter-branch.  You also didn't mention if you manually
> cleaned the reflogs; normally even "broken" reflogs are retained for 30
> days, and it may be that reflogs can keep objects alive.
I don't know that much about git internals, that's why I asked here. I was 
under the impression that such stuff is automatically deleted.

> Here's some steps for cleanup:
> 1. Delete all the backup refs created by filter-branch under refs/original.
> (rm -r .git/refs/original)
> 2. Expire reflogs that point to pre-filtered commits and all other objects
> unreachable from the current tips. (git reflog --expire --expire-
> unreachable=now --all)
> 3. Repack the repository and leave any pre-filter object (or otherwise
> unreachable objects) as an unpacked object; remove redundant packs. (git
> repack -A -d)
> 4. Prune all loose objects that only existed in pre-filtered commits, which
> should be all the unpacked objects and nothing in a pack. (git prune
> --expire now)
> 
> (git gc --aggresive) uses "30 days" by default in step 2 instead of my
> "now". It also uses "2 weeks ago" by default in step 4 instead of my
> "now".
> 
> So, if reflogs can keep objects alive, your repository would have started
> shrinking in about 30 days.  If reflogs can't keep objects alive, your
> repository would have started shrinking in about 2 weeks.
Thanks, after following all of your tips it's now down to 44M instead of 64M - 
but that's still way too large. Having a look at the history again, I see that 
many of the recent unrelated commits vanished, but there are still bunch of 
commits from 2007 and pre-2007 that are completely unrelated.

Maybe the initial svn2git rules are not correct so that git somehow thinks the 
commits would be related... but some of those rules would then be REALLY 
wrong, which I don't quite believe.

-- 
Arno Rehn
arno at arnorehn.de


More information about the Kde-scm-interest mailing list