[Kde-scm-interest] Re: git filter-branch preserving history

Boyd Stephen Smith Jr. bss at iguanasuicide.net
Sat Nov 20 21:26:22 CET 2010


In <201011201432.48975.zander at kde.org>, Thomas Zander wrote:
>On Friday 19. November 2010 21.31.35 Arno Rehn wrote:
>> On Friday 19 November 2010 21:04:15 Boyd Stephen Smith Jr. wrote:
>> > In <201011192001.20255.kde at arnorehn.de>, Arno Rehn wrote:
>> > >we're currently experimenting how to best convert kdebindings to git.
>> > >One thing that we tried was 'git filter-branch --subdirectory-filter
>> > >ruby -- -- all' to remove everything but the 'ruby' subdir and make
>> > >this the top-level directory. The resulting directory structure looked
>> > >good, but strangely the history of all other unrelated subdirs was left
>> > >intact, even after git gc -- aggressive and git prune.
>> > >So the resultung repository wasn't any smaller than the original
>> > >monolithic kdebindings thing, even though there was only a small subdir
>> > >left.
>> > 
>> > You probably just didn't try hard enough to lose the objects.  The "--
>> > aggressive" option to gc doesn't shrink the pruning window, it "simply"
>> > adjusts the packing done.  You didn't mention if you removed the backup
>> > refs created by filter-branch.  You also didn't mention if you manually
>> > cleaned the reflogs; normally even "broken" reflogs are retained for 30
>> > days, and it may be that reflogs can keep objects alive.
>> 
>> I don't know that much about git internals, that's why I asked here. I was
>> under the impression that such stuff is automatically deleted.
>> 
>> > Here's some steps for cleanup:
>> > 1. Delete all the backup refs created by filter-branch under
>> > refs/original. (rm -r .git/refs/original)
>> > 2. Expire reflogs that point to pre-filtered commits and all other
>> > objects unreachable from the current tips. (git reflog --expire
>> > --expire- unreachable=now --all)
>> > 3. Repack the repository and leave any pre-filter object (or otherwise
>> > unreachable objects) as an unpacked object; remove redundant packs. (git
>> > repack -A -d)
>> > 4. Prune all loose objects that only existed in pre-filtered commits,
>> > which should be all the unpacked objects and nothing in a pack. (git
>> > prune --expire now)
>> > 
>> > (git gc --aggresive) uses "30 days" by default in step 2 instead of my
>> > "now". It also uses "2 weeks ago" by default in step 4 instead of my
>> > "now".
>> 
>> Thanks, after following all of your tips it's now down to 44M instead of
>> 64M - but that's still way too large.
>
>doing a prune last makes me wonder if a repack at the end is not needed;
>   git repack -a -d -f

I recommended the "-A" option, which is similar to the "-a" option with one 
critical difference:  Instead of leaving unreachable objects in old packs, it 
writes them out as loose objects, which the prune then takes care of.  So, 
after the repack I suggested all reachable objects are in recently created 
packs and all unreachable objects are loose.  Then the prune I suggested 
deletes all loose, unreachable objects.  All reachable objects (the only ones 
left) are already in packs, so there's no need for a repack after the prune.

I specifically choose my order and options because git prune will not remove 
unreachable, packed objects.  The "-A" option to repack was the easiest way I 
could find to write out unreachable, packed objects as unreachable, loose 
objects.
-- 
Boyd Stephen Smith Jr.                   ,= ,-_-. =.
bss at iguanasuicide.net                   ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy         `-'(. .)`-'
http://iguanasuicide.net/                    \_/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.kde.org/pipermail/kde-scm-interest/attachments/20101120/cc67ca1b/attachment.sig 


More information about the Kde-scm-interest mailing list