PO sync is unnaturally increasing repository sizes

Andre Heinecke aheinecke at gnupg.org
Thu Aug 10 13:00:05 BST 2023


Hi,

tl;dr; po sync is blowing up our repository sizes far more then it appears to 
be necessary. We might need a force push across all repos to correct that. 
Kleopatra repo has increased in size tenfold since po files were added less 
then a year ago.


I recently noticed that Kleopatra has gained some weight. While she is an old 
lady, and when she was split up from the old KDEPIM repo took all her history 
with it she was always quite chubby. But not by that much. ( I am messy with 
Mega / Mebi here since it is not important for the overall picture) 

So let us see:
A fresh clone of Kleopatra:
209M    kleopatra
Running:
git filter-repo --path po --invert-paths
21M     kleopatra

Let us do the same for KMail:
Before:
169M    kmail
after:
56M     kmail

Now yes Kleopatra has quite a few translations. Their checked out size is 
about 29Megabytes. But there is something wrong here.

What I don't understand though is that if I look at the scripty commits in the 
git log, nothing seems unusual. 

But Let us take the language of Low Saxon. I hope that offends the least people 
here. There have been no new translations there in ~10 years.

It's checked out size is 460KB.

In master we have:
715 translated messages, 709 fuzzy translations, 428 untranslated messages.
Going back to the first revision that added po files:
763 translated messages, 645 fuzzy translations, 391 untranslated messages.

Sizes are fairly equal with master of course a bit larger. Now this language, 
unchanged in translation. Has alone added 10 Megabytes. That is about half of 
the size of the complete history of the real source code for Kleopatra. 

  du -hs .                                                                                                                                                                                      
  209M
  git filter-repo --path po/nds --invert-paths --force 
  du -hs .                                                                                                                                                                                      
  199M

Now here is what I don't understand. If I look at the changes 
git log -p po/nds/kleopatra.po | wc -c                                                                                                                                                                
164774
That seems reasonable for all the automatic scripty updates and even with all 
the context lines, that is just 1,6MB uncompressed.

And this is where my git understanding runs into limits. To understand why the 
history has gotten so large i tried some snippets from stackoverflow and from 
there with:
 git rev-list --objects --all po/nds/kleopatra.po| git cat-file --batch-
check='%(objecttype) %(objectname) %(objectsize) %(rest)'  |                                                         
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41-

I think that I can roughly see that apparently each commit in the repo has a 
blob associated with it that is the same size of the file.

So can some git sleuth please investigte what is happening here? This kind of 
repo growth is unstainable and at least for Kleopatra I see no possible 
solution then to figure this out and then remove the po history from the last 
year with a force push :-/

Don't get me wrong I like that the po files are now also in the repo, and that 
this will of course increase the repo size, but something fishy is going on 
here in my opinion.


Best Regards,
Andre

-- 
GnuPG.com - a brand of g10 Code, the GnuPG experts.

g10 Code GmbH, Erkrath/Germany, AG Wuppertal HRB14459
GF Werner Koch, USt-Id DE215605608, www.g10code.com.

GnuPG e.V., Rochusstr. 44, D-40479 Düsseldorf.  VR 11482 Düsseldorf
Vorstand: W.Koch, B.Reiter, A.Heinecke        Mail: board at gnupg.org
Finanzamt D-Altstadt, St-Nr: 103/5923/1779.   Tel: +49-211-28010702
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 5655 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-devel/attachments/20230810/dc025b8b/attachment.sig>


More information about the kde-devel mailing list