Proposal to introduce a KDE::stringCache

Thu Feb 26 18:17:01 CET 2004

Hello List,
I am locking for discussion - but I am not (yet) a subscriber of this list. If 
you find that it is architectural stuff I could re-send it to kde-core-devel

## The story:

  This morning kde-core-devel had some discussion about memory consumption by 
kfm when reading very large folders (there was actually an out-of-memory 
problem). The thread-starter said that each file entry uses 600 bytes.
   How can this be? Answer: an entry uses lots of QStrings and each QString 
has 2 (sometimes 3) blocks of memory allocated. Such strings are: FileName, 
Group, Owner (maybe more). Each of the mentioned strings uses maybe 50 bytes 
x 3 = 150. Here we can help.

## Similar things:

  When a KDE app starts it is likely that a kdDebug or kdWarning causes the 
kdebugrc file to be loaded. That file often contains tens of entries like:

  [7102]
  InfoOutput=4

Each "InfoOutput" ends as an allocated QString and is kept loaded as long as 
the app runs. This can easily allocate a couple of kBytes.

## Proposal:

In the above examples each QString was created separately - in other words 
even if they contain the same data like "root" or "InfoOutput" this data is 
not shared. Each QString has it's own copy. The information is redundant, we 
increased the entropy without need.

For such situations we should create a little cache that tracks maybe the last 
64 QStrings that were created. If we try to create a new QString having data 
that is still known by this cache, the cached QString-data should should be 
referenced (QString works internally with references). This saves one memory 
block for each match. Example:

      atom.m_str = KDE::stringCache(source_unicode)

or even better:

      atom.m_str = KDE::stringCache(source_latin, length)

The implementation could use a fast hash algorithm to locate the cached string 
data. Often QStrings are created from latin1 (most KIO-Slaves do this) - the 
hash should be made such that it can be calculated from latin1 or from 
Unicode (use the lower 7 bits only). The cache might also store the latin1 
data (implement carefully - don't call latin1 of the QString). As a result 
the cache could not only save memory but also speed up string creation (no 
need to convert latin1 -> Unicode, one malloc less).

In situations to use repeated local data like parsing configuration files or 
directories (KIO-Slaves) such a cache could not only save lots of memory but 
would also give a speed improvement.

Ok, how can now prove that I am an idiot?
Yours Jürgen