[Kde-scm-interest] Re: git filter-branch preserving history

Boyd Stephen Smith Jr. bss at iguanasuicide.net
Fri Nov 19 22:19:56 CET 2010


In <201011192131.35518.kde at arnorehn.de>, Arno Rehn wrote:
>On Friday 19 November 2010 21:04:15 Boyd Stephen Smith Jr. wrote:
>> In <201011192001.20255.kde at arnorehn.de>, Arno Rehn wrote:
>> >we're currently experimenting how to best convert kdebindings to git. One
>> >thing that we tried was 'git filter-branch --subdirectory-filter ruby --
>> >-- all' to remove everything but the 'ruby' subdir and make this the
>> >top-level directory. The resulting directory structure looked good, but
>> >strangely the history of all other unrelated subdirs was left intact,
>> >even after git gc -- aggressive and git prune.
>> 
>> You probably just didn't try hard enough to lose the objects.
>>
>> Here's some steps for cleanup:
>> 1. Delete all the backup refs created by filter-branch under
>> refs/original. (rm -r .git/refs/original)
>> 2. Expire reflogs that point to pre-filtered commits and all other objects
>> unreachable from the current tips. (git reflog --expire --expire-
>> unreachable=now --all)
>> 3. Repack the repository and leave any pre-filter object (or otherwise
>> unreachable objects) as an unpacked object; remove redundant packs. (git
>> repack -A -d)
>> 4. Prune all loose objects that only existed in pre-filtered commits,
>> which should be all the unpacked objects and nothing in a pack. (git
>> prune --expire now)
>
>Thanks, after following all of your tips it's now down to 44M instead of 64M
>- but that's still way too large. Having a look at the history again, I see
>that many of the recent unrelated commits vanished, but there are still
>bunch of commits from 2007 and pre-2007 that are completely unrelated.
>
>Maybe the initial svn2git rules are not correct so that git somehow thinks
>the commits would be related... but some of those rules would then be
>REALLY wrong, which I don't quite believe.

I tend to doubt that.  I think filter-branch preserved empty commits that 
weren't at the tip for some reason.  The documentation seems to imply that 
keeping empty commits is the default behavior.  --subdirectory-filter does 
imply --remap-to-ancestor, but that will only drop empty commits near the tip.

I think "--prune-empty" empty option to filter-branch should help clean up 
your history.  I don't think it will shrink the repository much, since the 
unmodified tree objects are shared between an empty commit and it's parent, 
but it is worth a try.  Of course, after the filter-branch, another cleanup 
run would be required.

Since you've already got the repository filtered down to just the ruby 
subdirectory, (git filter-branch --prune-empty -- --all) may do what you want 
(untested).  I'm thinking that "--prune-empty" might use the wrong message for 
the initial commit.  If it does, and that is a problem, undo the filter-branch 
(BEFORE cleanup), and let me know.  If you can identify the correct first 
commit, I may be able to some up with a rebase/filter-branch recipe for you.
-- 
Boyd Stephen Smith Jr.                   ,= ,-_-. =.
bss at iguanasuicide.net                   ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy         `-'(. .)`-'
http://iguanasuicide.net/                    \_/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://mail.kde.org/pipermail/kde-scm-interest/attachments/20101119/95eec3fe/attachment.sig 


More information about the Kde-scm-interest mailing list