[Kde-pim] Review Request 116692: Lower memory usage of akonadi_baloo_indexer with frequent commits

Vishesh Handa me at vhanda.in
Mon Jul 7 11:00:41 BST 2014



> On March 21, 2014, 10:58 a.m., Christian Mollekopf wrote:
> > It turned out that most of the memory was used the ItemFetchJob loading all items into memory. We've now optimized this, and for me the indexer never goes beyond ~250MB (initial indexing), and during normal usage stays around 10MB. I made some experiments with notmuch mail (which also uses xapian), and it also stayed around 200MB. This could probably be further tweaked by adjusting XAPIAN_FLUSH_THRESHOLD to lower the amounts of commits that are held in memory, but IMO 250MB for the initial indexing is a sane default value.
> > 
> > The only optimization that I think would be viable is releasing the memory again using malloc_free or alike (as we used to do in the nepomuk indexer).
> > 
> > So have the recent fixes also fixed the memory consumption for you or do you still think this patch should go in?

Ping. If there are no objections, I'll discard this review request.


- Vishesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/116692/#review53637
-----------------------------------------------------------


On March 10, 2014, 11:12 a.m., Aaron J. Seigo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/116692/
> -----------------------------------------------------------
> 
> (Updated March 10, 2014, 11:12 a.m.)
> 
> 
> Review request for Akonadi and Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Baloo is using Xapian for storing processed results from data fed to it by akonadi; in doing so it processes all the data it is sent to index and only once this is complete is the data committed to the Xapian database. From http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#acbea2163142de795024880a7123bc693 we see: "For efficiency reasons, when performing multiple updates to a database it is best (indeed, almost essential) to make as many modifications as memory will permit in a single pass through the database. To ensure this, Xapian batches up modifications." This means that *all* the data to be stored in the Xapian database first ends up in RAM. When indexing large mailboxes (or any other large chunk of data) this results in a very large amount of memory allocation. On one test of 100k mails in a maildir folder this resulted in 1.5GB of RAM used. In normal daily usage with maildir I find that it easily balloons to several hundred megabytes within day
 s. This makes the Baloo indexer unusable on systems with smaller amounts of memory (e.g. mobile devices, which typically have only 512MB-2GB of RAM)
> 
> Making this even worse is that the indexer is both long-lived *and* the default glibc allocator is unable to return the used memory back to the OS (probably due to memory fragmentation, though I have not confirmed this). Use of other allocators shows the temporary ballooning of memory during processing, but once that is done the memory is released and returned back to the OS. As such, this is not a memory leak .. but it behaves like one on systems with the default glibc allocator with akonai_baloo_indexer taking increasingly large amounts of memory on the system that never get returned to the OS. (This is actually how I noticed the problem in the first place.)
> 
> The approach used to address this problem is to periodically commit data to the Xapian database. This happens uniformly and transparently to the AbstractIndexer subclasses. The exact behavior is controlled by the s_maxUncommittedItems constant which is set arbitrarily to 100: after an indexer hits 100 uncommitted changes, the results are committed immediately. Caveats:
> 
> * This is not a guaranteed fix for the memory fragmentation issue experienced with glibc: it is still possible for the memory to grow slowly over time as each smaller commit leaves some % of un-releasable memory due to fragmentation. It has helped with day to day usage here, but in the "100k mails in a maildir structure" test memory did still balloon upwards. 
> 
> * It make indexing non-atomic from akonadi's perspective: data fed to akonadi_baloo_indexer to be indexed may show up in chunks and even, in the case of a crash of the indexer, be only partially added to the database.
> 
> Alternative approaches (not necessarily mutually exclusive to this patch or each other):
> 
> * send smaller data sets from akonadi to akonadi_baloo_indexer for processing. This would allow akonadi_baloo_indexer to retain the atomic commit approach while avoiding the worst of the Xapian memory usage; it would not address the issue of memory fragmentation
> * restart akonadi_baloo_indexer process from time to time; this would resolve the fragmentation-over-time issue but not the massive memory usage due to atomically indexing large datasets
> * improve Xapian's chert backend (to become default in 1.4) to not fragment memory so much; this would not address the issue of massive memory usage due to atomically indexing large datasets
> * use an allocator other than glibc's; this would not address the issue of massive memory usage due to atomically indexing large datasets
> 
> 
> Diffs
> -----
> 
>   src/pim/agent/emailindexer.cpp 05f80cf 
>   src/pim/agent/abstractindexer.h 8ae6f5c 
>   src/pim/agent/abstractindexer.cpp fa9e96f 
>   src/pim/agent/akonotesindexer.h 83f36b7 
>   src/pim/agent/akonotesindexer.cpp ac3e66c 
>   src/pim/agent/contactindexer.h 49dfdeb 
>   src/pim/agent/contactindexer.cpp a5a6865 
>   src/pim/agent/emailindexer.h 9a5e5cf 
> 
> Diff: https://git.reviewboard.kde.org/r/116692/diff/
> 
> 
> Testing
> -------
> 
> I have been running with the patch for a couple of days and one other person on irc has tested an earlier (but functionally equivalent) version. Rather than reaching the common 250MB+ during regular usage it now idles at ~20MB (up from ~7MB when first started; so some fragmentation remains as noted in the description, but with far better long-term results)
> 
> 
> Thanks,
> 
> Aaron J. Seigo
> 
>

_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/



More information about the kde-pim mailing list