[Kde-scm-interest] Re: git filter-branch preserving history

Thomas Zander zander at kde.org
Sat Nov 20 14:32:48 CET 2010


On Friday 19. November 2010 21.31.35 Arno Rehn wrote:
> On Friday 19 November 2010 21:04:15 Boyd Stephen Smith Jr. wrote:
> > In <201011192001.20255.kde at arnorehn.de>, Arno Rehn wrote:
> > >we're currently experimenting how to best convert kdebindings to git.
> > >One thing that we tried was 'git filter-branch --subdirectory-filter
> > >ruby -- -- all' to remove everything but the 'ruby' subdir and make
> > >this the top-level directory. The resulting directory structure looked
> > >good, but strangely the history of all other unrelated subdirs was left
> > >intact, even after git gc -- aggressive and git prune.
> > >So the resultung repository wasn't any smaller than the original
> > >monolithic kdebindings thing, even though there was only a small subdir
> > >left.
> > 
> > You probably just didn't try hard enough to lose the objects.  The "--
> > aggressive" option to gc doesn't shrink the pruning window, it "simply"
> > adjusts the packing done.  You didn't mention if you removed the backup
> > refs created by filter-branch.  You also didn't mention if you manually
> > cleaned the reflogs; normally even "broken" reflogs are retained for 30
> > days, and it may be that reflogs can keep objects alive.
> 
> I don't know that much about git internals, that's why I asked here. I was
> under the impression that such stuff is automatically deleted.
> 
> > Here's some steps for cleanup:
> > 1. Delete all the backup refs created by filter-branch under
> > refs/original. (rm -r .git/refs/original)
> > 2. Expire reflogs that point to pre-filtered commits and all other
> > objects unreachable from the current tips. (git reflog --expire
> > --expire- unreachable=now --all)
> > 3. Repack the repository and leave any pre-filter object (or otherwise
> > unreachable objects) as an unpacked object; remove redundant packs. (git
> > repack -A -d)
> > 4. Prune all loose objects that only existed in pre-filtered commits,
> > which should be all the unpacked objects and nothing in a pack. (git
> > prune --expire now)
> > 
> > (git gc --aggresive) uses "30 days" by default in step 2 instead of my
> > "now". It also uses "2 weeks ago" by default in step 4 instead of my
> > "now".
> > 
> > So, if reflogs can keep objects alive, your repository would have started
> > shrinking in about 30 days.  If reflogs can't keep objects alive, your
> > repository would have started shrinking in about 2 weeks.
> 
> Thanks, after following all of your tips it's now down to 44M instead of
> 64M - but that's still way too large. Having a look at the history again,
> I see that many of the recent unrelated commits vanished, but there are
> still bunch of commits from 2007 and pre-2007 that are completely
> unrelated.

doing a prune last makes me wonder if a repack at the end is not needed;
   git repack -a -d -f
-- 
Thomas Zander


More information about the Kde-scm-interest mailing list