[Kde-pim] Review Request 116692: Lower memory usage of akonadi_baloo_indexer with frequent commits

Jos Poortvliet jospoortvliet at gmail.com
Fri Mar 14 22:00:34 GMT 2014


On Monday 10 March 2014 10:08:29 Aaron J. Seigo wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/116692/
> -----------------------------------------------------------
> 
> Review request for Akonadi and Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Baloo is using Xapian for storing processed results from data fed to it by
> akonadi; in doing so it processes all the data it is sent to index and only
> once this is complete is the data committed to the Xapian database. From
> http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#acb
> ea2163142de795024880a7123bc693 we see: "For efficiency reasons, when
> performing multiple updates to a database it is best (indeed, almost
> essential) to make as many modifications as memory will permit in a single
> pass through the database. To ensure this, Xapian batches up
> modifications." This means that *all* the data to be stored in the Xapian
> database first ends up in RAM. When indexing large mailboxes (or any other
> large chunk of data) this results in a very large amount of memory
> allocation. On one test of 100k mails in a maildir folder this resulted in
> 1.5GB of RAM used. In normal daily usage with maildir I find that it easily
> balloons to several hundred megabytes within days. This makes the Baloo
> indexer unusable on systems with smaller amounts of memory (e.g. mobile
> devices, which typically have only 512MB-2GB of RAM)
> 
> Making this even worse is that the indexer is both long-lived *and* the
> default glibc allocator is unable to return the used memory back to the OS
> (probably due to memory fragmentation, though I have not confirmed this).
> Use of other allocators shows the temporary ballooning of memory during
> processing, but once that is done the memory is released and returned back
> to the OS. As such, this is not a memory leak .. but it behaves like one on
> systems with the default glibc allocator with akonai_baloo_indexer taking
> increasingly large amounts of memory on the system that never get returned
> to the OS. (This is actually how I noticed the problem in the first place.)
> 
> The approach used to address this problem is to periodically commit data to
> the Xapian database. This happens uniformly and transparently to the
> AbstractIndexer subclasses. The exact behavior is controlled by the
> s_maxUncommittedItems constant which is set arbitrarily to 100: after an
> indexer hits 100 uncommitted changes, the results are committed
> immediately. Caveats:
> 
> * This is not a guaranteed fix for the memory fragmentation issue
> experienced with glibc: it is still possible for the memory to grow slowly
> over time as each smaller commit leaves some % of un-releasable memory due
> to fragmentation. It has helped with day to day usage here, but in the
> "100k mails in a maildir structure" test memory did still balloon upwards.
> 
> * It make indexing non-atomic from akonadi's perspective: data fed to
> akonadi_baloo_indexer to be indexed may show up in chunks and even, in the
> case of a crash of the indexer, be only partially added to the database.
> 
> Alternative approaches (not necessarily mutually exclusive to this patch or
> each other):
> 
> * send smaller data sets from akonadi to akonadi_baloo_indexer for
> processing. This would allow akonadi_baloo_indexer to retain the atomic
> commit approach while avoiding the worst of the Xapian memory usage; it
> would not address the issue of memory fragmentation * restart
> akonadi_baloo_indexer process from time to time; this would resolve the
> fragmentation-over-time issue but not the massive memory usage due to
> atomically indexing large datasets * improve Xapian's chert backend (to
> become default in 1.4) to not fragment memory so much; this would not
> address the issue of massive memory usage due to atomically indexing large
> datasets * use an allocator other than glibc's; this would not address the
> issue of massive memory usage due to atomically indexing large datasets
> 
> 
> Diffs
> -----
> 
>   src/pim/agent/abstractindexer.h 8ae6f5c
>   src/pim/agent/abstractindexer.cpp fa9e96f
>   src/pim/agent/akonotesindexer.h 83f36b7
>   src/pim/agent/akonotesindexer.cpp ac3e66c
>   src/pim/agent/contactindexer.h 49dfdeb
>   src/pim/agent/contactindexer.cpp a5a6865
>   src/pim/agent/emailindexer.h 9a5e5cf
>   src/pim/agent/emailindexer.cpp 05f80cf
> 
> Diff: https://git.reviewboard.kde.org/r/116692/diff/
> 
> 
> Testing
> -------
> 
> I have been running with the patch for a couple of days and one other person
> on irc has tested an earlier (but functionally equivalent) version. Rather
> than reaching the common 250MB+ during regular usage it now idles at ~20MB
> (up from ~7MB when first started; so some fragmentation remains as noted in
> the description, but with far better long-term results)

Just curious - did this get into Beta2? I just had to restart Akonadi with a 
akonadi-baloo-feeder process of 2.2 gb...

(I just upgraded from KDE PIM 4.12 to 4.13 beta 2)


> 
> Thanks,
> 
> Aaron J. Seigo
> 
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/

_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/



More information about the kde-pim mailing list