PO sync is unnaturally increasing repository sizes
Albert Astals Cid
aacid at kde.org
Thu Aug 10 19:22:11 BST 2023
El dijous, 10 d’agost de 2023, a les 14:00:05 (CEST), Andre Heinecke va
escriure:
> Hi,
>
> tl;dr; po sync is blowing up our repository sizes far more then it appears
> to be necessary. We might need a force push across all repos to correct
> that. Kleopatra repo has increased in size tenfold since po files were
> added less then a year ago.
>
>
> I recently noticed that Kleopatra has gained some weight. While she is an
> old lady, and when she was split up from the old KDEPIM repo took all her
> history with it she was always quite chubby. But not by that much. ( I am
> messy with Mega / Mebi here since it is not important for the overall
> picture)
>
> So let us see:
> A fresh clone of Kleopatra:
> 209M kleopatra
> Running:
> git filter-repo --path po --invert-paths
> 21M kleopatra
>
> Let us do the same for KMail:
> Before:
> 169M kmail
> after:
> 56M kmail
>
> Now yes Kleopatra has quite a few translations. Their checked out size is
> about 29Megabytes. But there is something wrong here.
>
> What I don't understand though is that if I look at the scripty commits in
> the git log, nothing seems unusual.
>
> But Let us take the language of Low Saxon. I hope that offends the least
> people here. There have been no new translations there in ~10 years.
>
> It's checked out size is 460KB.
>
> In master we have:
> 715 translated messages, 709 fuzzy translations, 428 untranslated messages.
> Going back to the first revision that added po files:
> 763 translated messages, 645 fuzzy translations, 391 untranslated messages.
>
> Sizes are fairly equal with master of course a bit larger. Now this
> language, unchanged in translation. Has alone added 10 Megabytes. That is
> about half of the size of the complete history of the real source code for
> Kleopatra.
>
> du -hs .
> 209M
> git filter-repo --path po/nds --invert-paths --force
> du -hs .
> 199M
>
> Now here is what I don't understand. If I look at the changes
> git log -p po/nds/kleopatra.po | wc -c
> 164774
> That seems reasonable for all the automatic scripty updates and even with
> all the context lines, that is just 1,6MB uncompressed.
>
> And this is where my git understanding runs into limits. To understand why
> the history has gotten so large i tried some snippets from stackoverflow
> and from there with:
> git rev-list --objects --all po/nds/kleopatra.po| git cat-file --batch-
> check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
> sed -n 's/^blob //p' |
> sort --numeric-sort --key=2 |
> cut -c 1-12,41-
>
> I think that I can roughly see that apparently each commit in the repo has a
> blob associated with it that is the same size of the file.
>
> So can some git sleuth please investigte what is happening here? This kind
> of repo growth is unstainable and at least for Kleopatra I see no possible
> solution then to figure this out and then remove the po history from the
> last year with a force push :-/
>
> Don't get me wrong I like that the po files are now also in the repo, and
> that this will of course increase the repo size, but something fishy is
> going on here in my opinion.
As discussed on Matrix, it seems doing
git gc ---aggresive
brings down kleopatra size from 223M to 55M
Is this something worth doing on the server side?
Anyone knows of any potential downsides of that?
Cheers,
Albert
>
>
> Best Regards,
> Andre
More information about the kde-devel
mailing list