Reducing working memory footprint by string deduplication of geodata
Friedrich W. H. Kossebau
kossebau at kde.org
Wed Aug 31 19:05:58 UTC 2016
Hi,
I would like some input what would be a good design to allow string de-
duplication in all geodata in the working memory.
MOTIVATION
One bigger reason of all the memory used by Marble is that when loading data
from files there are lots of repeating strings in the data that are also
stored as string in the working memory. Still each string gets its own
separate complete copy. *1
FIRST EXPERIMENT
I did a quick hack to CacheRunner::parseFile(), to try de-duplication at least
per cache file loaded. And could see for themes with usage of big cache files
like bluemarble or openstreetmap a reduction of > 3 MB :) without noticable
price on load time. *2
That seemed good enough for an intermediate improvement to go in as commit
already, so pushed as 690fcf380985c5fefbb8531fbeb54a1432b49044
(https://quickgit.kde.org/?p=marble.git&a=commitdiff&h=690fcf380985c5fe)
In that commit you can also see some similar changes to OSMParser (which
accidentally slipped in). And they show that the per-file string de-
duplication with lots of smaller files only yields little improvements (< 1 MB
seen with hello-world example and vectorosm theme on plain app start), so not
the silver bullet.
CHALLENGE
Perfect solution: there would not be any duplicated string used by all the
geodata objects (there might be even more candidates, QDateTime is also
implicitely shared at least in Qt5 and might have lots of duplicates).
Which means:
* on adding new geodata to the working memory, e.g. on loading from disk or
the network, any string should be checked if there is already a duplicate and
if so the original should be used.
* on changing some string of some geodata, the new string should be checked
against existing strings and instead an existing be used, if present, as well
as the old string be checked if noone else is using it and if so being
removed.
* on removing some string of some geodata. the old string be checked if noone
else is using it and if so being removed.
APPROACH?
Has there ever been any discussion about that? Any previous approaches?
The part of removing no longer used strings from the working memory would be
automatically solved by QString. But what to do about finding existing
instances of a given QString?
There could need some kind of lookup facility. like the QSet<QString> used in
the commit for CacheRunner.
Keeping such a global QSet<QString> around all the time has 2 disadvantages:
* each QString object in the set would block the automatic removal of no
longer used strings, as the entry still references it (and there seem no hooks
in QString to help this)
* steady memory footprint of the table (cost unknown)
Creating the table on the fly on loading a new file or on a change has the
disadvantage of a runtime price (cost unknown).
There could be also something like a garbage^Wde-duplication collection which
is run after some time when a file has been loaded or some changes have been
made.
Or a new class MString would be created which has that de-duplication somehow
built in?
YOUR INPUT!
De-duplicating data so far seems a worthwhile thing to investigate more into
to help with the unfortune memory usage of Marble. Being a newbie with
Marble(-like) data structures I am curious what the veterans and everyone else
has to say. Please do :)
*1)
A similar problem also exists with QString objects created on the fly from C++
raw strings e.g. in a loop or repeated function call. That is why I recently
started to add QStringLiteral wrappers to more and more such raw strings,
which results in turning the raw strings on compilation to a format that then
will be shared by the created QString objects. Or QLatin1String in other
places, where a QString instance can be avoided by method overloads in the
used API.
*2)
Seen e.g. with
valgrind --tool=massif build/marble/examples/cpp/hello-marble/hello-marble
there the memory allocated by QArrayData::allocate(...) is reduced.
Cheers
Friedrich
More information about the Marble-devel
mailing list