PO sync is unnaturally increasing repository sizes
M G Berberich
bewerbungen at m-berberich.de
Thu Aug 10 13:27:51 BST 2023
Perhaps git does not recognize the files as text, so does not do diffs.
Am 10. August 2023 14:00:05 MESZ schrieb Andre Heinecke <aheinecke at gnupg.org>:
>Hi,
>
>tl;dr; po sync is blowing up our repository sizes far more then it appears to
>be necessary. We might need a force push across all repos to correct that.
>Kleopatra repo has increased in size tenfold since po files were added less
>then a year ago.
>
>
>I recently noticed that Kleopatra has gained some weight. While she is an old
>lady, and when she was split up from the old KDEPIM repo took all her history
>with it she was always quite chubby. But not by that much. ( I am messy with
>Mega / Mebi here since it is not important for the overall picture)
>
>So let us see:
>A fresh clone of Kleopatra:
>209M kleopatra
>Running:
>git filter-repo --path po --invert-paths
>21M kleopatra
>
>Let us do the same for KMail:
>Before:
>169M kmail
>after:
>56M kmail
>
>Now yes Kleopatra has quite a few translations. Their checked out size is
>about 29Megabytes. But there is something wrong here.
>
>What I don't understand though is that if I look at the scripty commits in the
>git log, nothing seems unusual.
>
>But Let us take the language of Low Saxon. I hope that offends the least people
>here. There have been no new translations there in ~10 years.
>
>It's checked out size is 460KB.
>
>In master we have:
>715 translated messages, 709 fuzzy translations, 428 untranslated messages.
>Going back to the first revision that added po files:
>763 translated messages, 645 fuzzy translations, 391 untranslated messages.
>
>Sizes are fairly equal with master of course a bit larger. Now this language,
>unchanged in translation. Has alone added 10 Megabytes. That is about half of
>the size of the complete history of the real source code for Kleopatra.
>
> du -hs .
> 209M
> git filter-repo --path po/nds --invert-paths --force
> du -hs .
> 199M
>
>Now here is what I don't understand. If I look at the changes
>git log -p po/nds/kleopatra.po | wc -c
>164774
>That seems reasonable for all the automatic scripty updates and even with all
>the context lines, that is just 1,6MB uncompressed.
>
>And this is where my git understanding runs into limits. To understand why the
>history has gotten so large i tried some snippets from stackoverflow and from
>there with:
> git rev-list --objects --all po/nds/kleopatra.po| git cat-file --batch-
>check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
> sed -n 's/^blob //p' |
> sort --numeric-sort --key=2 |
> cut -c 1-12,41-
>
>I think that I can roughly see that apparently each commit in the repo has a
>blob associated with it that is the same size of the file.
>
>So can some git sleuth please investigte what is happening here? This kind of
>repo growth is unstainable and at least for Kleopatra I see no possible
>solution then to figure this out and then remove the po history from the last
>year with a force push :-/
>
>Don't get me wrong I like that the po files are now also in the repo, and that
>this will of course increase the repo size, but something fishy is going on
>here in my opinion.
>
>
>Best Regards,
>Andre
>
>--
>GnuPG.com - a brand of g10 Code, the GnuPG experts.
>
>g10 Code GmbH, Erkrath/Germany, AG Wuppertal HRB14459
>GF Werner Koch, USt-Id DE215605608, www.g10code.com.
>
>GnuPG e.V., Rochusstr. 44, D-40479 Düsseldorf. VR 11482 Düsseldorf
>Vorstand: W.Koch, B.Reiter, A.Heinecke Mail: board at gnupg.org
>Finanzamt D-Altstadt, St-Nr: 103/5923/1779. Tel: +49-211-28010702
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-devel/attachments/20230810/30e4d3f9/attachment.htm>
More information about the kde-devel
mailing list