PO sync is unnaturally increasing repository sizes

M G Berberich bewerbungen at m-berberich.de
Thu Aug 10 13:27:51 BST 2023


Perhaps git does not recognize the files as text, so does not do diffs.

Am 10. August 2023 14:00:05 MESZ schrieb Andre Heinecke <aheinecke at gnupg.org>:
>Hi,
>
>tl;dr; po sync is blowing up our repository sizes far more then it appears to 
>be necessary. We might need a force push across all repos to correct that. 
>Kleopatra repo has increased in size tenfold since po files were added less 
>then a year ago.
>
>
>I recently noticed that Kleopatra has gained some weight. While she is an old 
>lady, and when she was split up from the old KDEPIM repo took all her history 
>with it she was always quite chubby. But not by that much. ( I am messy with 
>Mega / Mebi here since it is not important for the overall picture) 
>
>So let us see:
>A fresh clone of Kleopatra:
>209M    kleopatra
>Running:
>git filter-repo --path po --invert-paths
>21M     kleopatra
>
>Let us do the same for KMail:
>Before:
>169M    kmail
>after:
>56M     kmail
>
>Now yes Kleopatra has quite a few translations. Their checked out size is 
>about 29Megabytes. But there is something wrong here.
>
>What I don't understand though is that if I look at the scripty commits in the 
>git log, nothing seems unusual. 
>
>But Let us take the language of Low Saxon. I hope that offends the least people 
>here. There have been no new translations there in ~10 years.
>
>It's checked out size is 460KB.
>
>In master we have:
>715 translated messages, 709 fuzzy translations, 428 untranslated messages.
>Going back to the first revision that added po files:
>763 translated messages, 645 fuzzy translations, 391 untranslated messages.
>
>Sizes are fairly equal with master of course a bit larger. Now this language, 
>unchanged in translation. Has alone added 10 Megabytes. That is about half of 
>the size of the complete history of the real source code for Kleopatra. 
>
>  du -hs .                                                                                                                                                                                      
>  209M
>  git filter-repo --path po/nds --invert-paths --force 
>  du -hs .                                                                                                                                                                                      
>  199M
>
>Now here is what I don't understand. If I look at the changes 
>git log -p po/nds/kleopatra.po | wc -c                                                                                                                                                                
>164774
>That seems reasonable for all the automatic scripty updates and even with all 
>the context lines, that is just 1,6MB uncompressed.
>
>And this is where my git understanding runs into limits. To understand why the 
>history has gotten so large i tried some snippets from stackoverflow and from 
>there with:
> git rev-list --objects --all po/nds/kleopatra.po| git cat-file --batch-
>check='%(objecttype) %(objectname) %(objectsize) %(rest)'  |                                                         
>  sed -n 's/^blob //p' |
>  sort --numeric-sort --key=2 |
>  cut -c 1-12,41-
>
>I think that I can roughly see that apparently each commit in the repo has a 
>blob associated with it that is the same size of the file.
>
>So can some git sleuth please investigte what is happening here? This kind of 
>repo growth is unstainable and at least for Kleopatra I see no possible 
>solution then to figure this out and then remove the po history from the last 
>year with a force push :-/
>
>Don't get me wrong I like that the po files are now also in the repo, and that 
>this will of course increase the repo size, but something fishy is going on 
>here in my opinion.
>
>
>Best Regards,
>Andre
>
>-- 
>GnuPG.com - a brand of g10 Code, the GnuPG experts.
>
>g10 Code GmbH, Erkrath/Germany, AG Wuppertal HRB14459
>GF Werner Koch, USt-Id DE215605608, www.g10code.com.
>
>GnuPG e.V., Rochusstr. 44, D-40479 Düsseldorf.  VR 11482 Düsseldorf
>Vorstand: W.Koch, B.Reiter, A.Heinecke        Mail: board at gnupg.org
>Finanzamt D-Altstadt, St-Nr: 103/5923/1779.   Tel: +49-211-28010702
-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-devel/attachments/20230810/30e4d3f9/attachment.htm>


More information about the kde-devel mailing list