[Kde-scm-interest] KDE trunk imported into Git

Thiago Macieira thiago at kde.org
Tue Dec 25 14:26:45 CET 2007


I've done a test import of KDE's trunk into Git.

Before I go into any details, let me just say that a full import of trunk 
is of ABSOLUTELY NO practical use. No one checks out all applications, 
all webpages, all translations and all other junk we have in trunk. Much 
less keeps the entire history of that around. Mind you that once upon a 
time we hosted applications like valgrind -- the history of that has been 
imported too.

This is only for a theoretical point of view. To see what could be done. 
Just because I could.

My first attempt was to use git-svn. I gave up after three days of import, 
when it had only reached revision 320k (packfile was around 2.5 GB at 
that time).

So I dug up Chris Lee's svn-fast-export tool from 
http://repo.or.cz/w/fast-export.git. With one bugfixing to stop it from 
crashing on KDE revision 3129 (Subversion commit that has no author), it 
managed to complete the work in about a day.

It would have been faster if I hadn't run out of disk space twice using 
it.

Statistics:
	Subversion commits parsed: 751818
	Disk size of Subversion repository: 34 GB

	Git commits: 623354
	Number of files in the HEAD revision: 210362
	Disk size of the initial Git repository: 53 GB

The fast-import statistics are attached.

Anyways, 53 GB is the size of the repository after git-fast-import 
finished with it. It would have been larger if I had not repacked about 
8.5 GB worth of it into ~2.5 GB during the procedure.

It's not unpacked, but it's not very well packed. git-fast-import 
generates a packfile on-the-fly, which means it has better compression 
than loose objects, but it's way, way worse than a full repack.

Besides, every time you create a new pack, you lose in size. And 
git-fast-import limits itself to 4 GB packs.

So I decided to run "git gc" in that repository. Specs of my machine:
	Intel Core 2 Duo T6420 @ 2.14 GHz
	4 GB of RAM
	Linux kernel 2.6.23.9-tmb-server-3mdv
	Mandriva Cooker 32-bit

It could not complete because git-pack-objects runs out of memory. I 
watched in ps what it was doing: when it reaches about 1.7 GB in RSS, it 
stops.

So I decided to try two new approaches:

1) create a .keep file for the 2.5 GB pack that is already well-compressed 
and run again.

This apparently works. Had I not run out of disk space *again* (I had only 
5.5 GB free on the filesystem), it would probably have finished by now. 
It's now running again, 10% done at the time I write this email.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
tmacieir  7427 13.9 39.7 1973936 1650188 pts/1 D+   13:33   6:39 git

2) Try a 64-bit machine with more RAM. Specs:
	Intel Core 2 Quad Q6600 @ 2.4 GHz
	8 GB of RAM
	Linux kernel 2.6.22.9-server-1mdv
	Mandriva 2008.0 64-bit

Instead of running git gc, I decided to repack *everything*. So I ran:
	time git repack -a -d -f --window=250 --depth=150

It's still running and it has been almost 9 hours. It's at 19% right now.

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
qt        75851 75.7 57.3 13328004 4693876 pts/2 R+  05:39 397:01 git

(I've seen the RSS value over 5.2 GB)

I'll let you know when those processes finish running.
-- 
  Thiago Macieira  -  thiago (AT) macieira.info - thiago (AT) kde.org
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:    9805000
Total objects:      1812360 (   8614741 duplicates                  )
      blobs  :      1025694 (   4436005 duplicates     803763 deltas)
      trees  :       718826 (   3623222 duplicates     602518 deltas)
      commits:        67840 (    555514 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:        1048576 (     38740 unique    )
      atoms:         282304
Memory total:        332811 KiB
       pools:         26405 KiB
     objects:        306406 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =     308400
pack_report: pack_mmap_calls          =       2200
pack_report: pack_open_windows        =          8 /          8
pack_report: pack_mapped              =  268435456 /  268435456
---------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://mail.kde.org/pipermail/kde-scm-interest/attachments/20071225/d4d1a25a/attachment.pgp 


More information about the Kde-scm-interest mailing list