Resource consumption for indexing user data, especially files and mails

Martin Steigerwald martin at lichtvoll.de
Sat Jun 26 12:03:29 BST 2021


Dear KDE community.

Thank you gladly for the energy efficiency goal. I do not fully buy into 
the climate catastrophe narrative anymore – there appear to be quite 
some scientists who doubt it meanwhile –, yet I think there still are 
many reasons to make things energy efficient like pro-longed battery life 
for laptops, reducing user frustration, stabilizing energy distribution 
networks and cheaper energy bills.


As mostly an user, who helps with bug reporting / triaging from time to 
time, yet only did one or two commits so far – regarding a performance 
issue in Akonadi's maildir resource –, I like to bring up a topic that 
from a quick skimming through your talk, Cornelius, I think was not 
mentioned explicitly so far. I just quickly skimmed through it instead 
of watching it fully, so I could have missed it.

I think it is good and important to measure the energy efficiency of 
applications like Okular, KMail and Krita.

But there is an aspect that I think would be important to include within 
the measurement or measure separately. That is that of the background 
services required to keep applications working and that of general 
background services like desktop search. Especially when it comes to 
indexing and searching. I do not have global numbers, but I wouldn't be 
surprised if the energy used for indexing the internet would be 
enormous. And every efficiency gain there would have a huge effect.

Similar for the desktop. Users often enough have lots of mails and lots 
of files. The work to index those is done on a huge lot of user systems 
so there is quite a multiplication factor.

Just an example from my new ThinkPad T14 Gen 1 AMD with AMD Ryzen 7 PRO 
4750U, 32 GiB of RAM and a Samsung 980 Pro 2 TB NVMe SSD:

% du -sh .local/share/akonadi/search_db .local/share/local-mail .local/
share/akonadi/file_db_data .local/share/akonadi/db_data
8,8G    .local/share/akonadi/search_db
17G     .local/share/local-mail
6,2G    .local/share/akonadi/file_db_data
3,3G    .local/share/akonadi/db_data

(Instead of several usually empty IMAP accounts, cause I move mails to 
local folders after reading, and one IMAP account which holds 2 weeks of 
mails for smartphone access, that I configured in KMail in order to 
create Sieve rules, all mails are stored locally)

Akonadi Indexing Agent is still not finished with indexing it. Akonadi 
moves all files through the Akonadi cache, i.e. either PostgreSQL 
database and/or file-based storage, in order to index them. This is 
eating up 2-4 CPU cores since hours. 2-3 of them for PostgreSQL related 
processes. In addition to that there is considerable disk usage which 
keeps that fast NVMe SSD, granted with BTRFS on top of LVM on top of 
LUKS – thus additional processing for encryption required –, at 80-100% 
busy most of the time writing about 300-400 MiB every 10 seconds in 
about 10000-20000 write operations every 10 seconds according to atop. I 
did not look at atopsar reports so it may not be completely constant, 
but whenever I looked atop, I saw this kind of workload going on.

Daniel Vrátil worked previously on fixing that with his work to make 
indexing great again¹. However as I understand he eventually kind of 
burned out.


Another example is Baloo:

% LANG=en balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 1,191,389
Files waiting for content indexing: 278,274
Files failed to index: 0
Current size of index is 8.12 GiB

In this case Baloo apears to be indexing the same files a *third* time 
meanwhile. It relies on device numbers being stable across reboots which 
is not guaranteed by the Linux kernel². I suspended it in the hope that 
the Akonadi mail indexing of mostly locally stored mails would 
eventually finish. The machine would have more than enough CPU power to 
do the file indexing in parallel, but I would like to ease load on /home 
for the time being. Usually I just removed execute permissions from /
usr/bin/akonadi_indexing – but this time I hope this new laptop is 
powerful enough so that Akonadi would eventually complete the task.


I did not yet fully learn how you monitor application energy 
consumption, how you determine which processes to monitor, I just 
skimmed through your presentation instead of watching it fully. But I 
kindly ask you to consider the considerable amount of background 
activity regarding to indexing user data and monitor the resource 
consumption from that as well.

I understand that making indexing work efficiently can be a hard task to 
solve. I do not have exact numbers, yet I think that indexing user data 
is one of the major contributors to energy and other resource 
consumption in KDE. The laptop is basically venting the fan at 3600 to 
4000 rpm basically constantly since some time yesterday when it started 
indexing mails. This may be an over cautious fan activity by default 
firmware, however often enough when really idle the fan is not active at 
all.

I know this topic has been brought up by users over and over again. But 
I think the main reason for that is, that the issue is still not fixed. I 
bet it is no low-hanging fruit and I hope that eventually someone with 
enough in-depth knowledge on the involved technologies *and* enough time 
would work on fixing these issues. I'd be willing to spend some money on 
that work – similar like a Krita fundraiser.

[1] Make Indexing Great Again

https://phabricator.kde.org/T7014

[2] Bug 438434 - Baloo appears to be indexing twice the number of files 
than are actually in my home directory

https://bugs.kde.org/show_bug.cgi?id=438434

Best,
-- 
Martin





More information about the Kde-eco-discuss mailing list