[Kde-scm-interest] KDE trunk imported into Git
Thiago Macieira
thiago at kde.org
Tue Dec 25 14:26:45 CET 2007
I've done a test import of KDE's trunk into Git.
Before I go into any details, let me just say that a full import of trunk
is of ABSOLUTELY NO practical use. No one checks out all applications,
all webpages, all translations and all other junk we have in trunk. Much
less keeps the entire history of that around. Mind you that once upon a
time we hosted applications like valgrind -- the history of that has been
imported too.
This is only for a theoretical point of view. To see what could be done.
Just because I could.
My first attempt was to use git-svn. I gave up after three days of import,
when it had only reached revision 320k (packfile was around 2.5 GB at
that time).
So I dug up Chris Lee's svn-fast-export tool from
http://repo.or.cz/w/fast-export.git. With one bugfixing to stop it from
crashing on KDE revision 3129 (Subversion commit that has no author), it
managed to complete the work in about a day.
It would have been faster if I hadn't run out of disk space twice using
it.
Statistics:
Subversion commits parsed: 751818
Disk size of Subversion repository: 34 GB
Git commits: 623354
Number of files in the HEAD revision: 210362
Disk size of the initial Git repository: 53 GB
The fast-import statistics are attached.
Anyways, 53 GB is the size of the repository after git-fast-import
finished with it. It would have been larger if I had not repacked about
8.5 GB worth of it into ~2.5 GB during the procedure.
It's not unpacked, but it's not very well packed. git-fast-import
generates a packfile on-the-fly, which means it has better compression
than loose objects, but it's way, way worse than a full repack.
Besides, every time you create a new pack, you lose in size. And
git-fast-import limits itself to 4 GB packs.
So I decided to run "git gc" in that repository. Specs of my machine:
Intel Core 2 Duo T6420 @ 2.14 GHz
4 GB of RAM
Linux kernel 2.6.23.9-tmb-server-3mdv
Mandriva Cooker 32-bit
It could not complete because git-pack-objects runs out of memory. I
watched in ps what it was doing: when it reaches about 1.7 GB in RSS, it
stops.
So I decided to try two new approaches:
1) create a .keep file for the 2.5 GB pack that is already well-compressed
and run again.
This apparently works. Had I not run out of disk space *again* (I had only
5.5 GB free on the filesystem), it would probably have finished by now.
It's now running again, 10% done at the time I write this email.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tmacieir 7427 13.9 39.7 1973936 1650188 pts/1 D+ 13:33 6:39 git
2) Try a 64-bit machine with more RAM. Specs:
Intel Core 2 Quad Q6600 @ 2.4 GHz
8 GB of RAM
Linux kernel 2.6.22.9-server-1mdv
Mandriva 2008.0 64-bit
Instead of running git gc, I decided to repack *everything*. So I ran:
time git repack -a -d -f --window=250 --depth=150
It's still running and it has been almost 9 hours. It's at 19% right now.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
qt 75851 75.7 57.3 13328004 4693876 pts/2 R+ 05:39 397:01 git
(I've seen the RSS value over 5.2 GB)
I'll let you know when those processes finish running.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 9805000
Total objects: 1812360 ( 8614741 duplicates )
blobs : 1025694 ( 4436005 duplicates 803763 deltas)
trees : 718826 ( 3623222 duplicates 602518 deltas)
commits: 67840 ( 555514 duplicates 0 deltas)
tags : 0 ( 0 duplicates 0 deltas)
Total branches: 1 ( 1 loads )
marks: 1048576 ( 38740 unique )
atoms: 282304
Memory total: 332811 KiB
pools: 26405 KiB
objects: 306406 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 33554432
pack_report: core.packedGitLimit = 268435456
pack_report: pack_used_ctr = 308400
pack_report: pack_mmap_calls = 2200
pack_report: pack_open_windows = 8 / 8
pack_report: pack_mapped = 268435456 / 268435456
---------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://mail.kde.org/pipermail/kde-scm-interest/attachments/20071225/d4d1a25a/attachment.pgp
More information about the Kde-scm-interest
mailing list