[Kde-scm-interest] Re: git filter-branch preserving history
Boyd Stephen Smith Jr.
bss at iguanasuicide.net
Sat Nov 20 21:26:22 CET 2010
In <201011201432.48975.zander at kde.org>, Thomas Zander wrote:
>On Friday 19. November 2010 21.31.35 Arno Rehn wrote:
>> On Friday 19 November 2010 21:04:15 Boyd Stephen Smith Jr. wrote:
>> > In <201011192001.20255.kde at arnorehn.de>, Arno Rehn wrote:
>> > >we're currently experimenting how to best convert kdebindings to git.
>> > >One thing that we tried was 'git filter-branch --subdirectory-filter
>> > >ruby -- -- all' to remove everything but the 'ruby' subdir and make
>> > >this the top-level directory. The resulting directory structure looked
>> > >good, but strangely the history of all other unrelated subdirs was left
>> > >intact, even after git gc -- aggressive and git prune.
>> > >So the resultung repository wasn't any smaller than the original
>> > >monolithic kdebindings thing, even though there was only a small subdir
>> > >left.
>> >
>> > You probably just didn't try hard enough to lose the objects. The "--
>> > aggressive" option to gc doesn't shrink the pruning window, it "simply"
>> > adjusts the packing done. You didn't mention if you removed the backup
>> > refs created by filter-branch. You also didn't mention if you manually
>> > cleaned the reflogs; normally even "broken" reflogs are retained for 30
>> > days, and it may be that reflogs can keep objects alive.
>>
>> I don't know that much about git internals, that's why I asked here. I was
>> under the impression that such stuff is automatically deleted.
>>
>> > Here's some steps for cleanup:
>> > 1. Delete all the backup refs created by filter-branch under
>> > refs/original. (rm -r .git/refs/original)
>> > 2. Expire reflogs that point to pre-filtered commits and all other
>> > objects unreachable from the current tips. (git reflog --expire
>> > --expire- unreachable=now --all)
>> > 3. Repack the repository and leave any pre-filter object (or otherwise
>> > unreachable objects) as an unpacked object; remove redundant packs. (git
>> > repack -A -d)
>> > 4. Prune all loose objects that only existed in pre-filtered commits,
>> > which should be all the unpacked objects and nothing in a pack. (git
>> > prune --expire now)
>> >
>> > (git gc --aggresive) uses "30 days" by default in step 2 instead of my
>> > "now". It also uses "2 weeks ago" by default in step 4 instead of my
>> > "now".
>>
>> Thanks, after following all of your tips it's now down to 44M instead of
>> 64M - but that's still way too large.
>
>doing a prune last makes me wonder if a repack at the end is not needed;
> git repack -a -d -f
I recommended the "-A" option, which is similar to the "-a" option with one
critical difference: Instead of leaving unreachable objects in old packs, it
writes them out as loose objects, which the prune then takes care of. So,
after the repack I suggested all reachable objects are in recently created
packs and all unreachable objects are loose. Then the prune I suggested
deletes all loose, unreachable objects. All reachable objects (the only ones
left) are already in packs, so there's no need for a repack after the prune.
I specifically choose my order and options because git prune will not remove
unreachable, packed objects. The "-A" option to repack was the easiest way I
could find to write out unreachable, packed objects as unreachable, loose
objects.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss at iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.kde.org/pipermail/kde-scm-interest/attachments/20101120/cc67ca1b/attachment.sig
More information about the Kde-scm-interest
mailing list