Resource consumption for indexing user data, especially files and mails
Martin Steigerwald
martin at lichtvoll.de
Sat Jun 26 12:03:29 BST 2021
Dear KDE community.
Thank you gladly for the energy efficiency goal. I do not fully buy into
the climate catastrophe narrative anymore – there appear to be quite
some scientists who doubt it meanwhile –, yet I think there still are
many reasons to make things energy efficient like pro-longed battery life
for laptops, reducing user frustration, stabilizing energy distribution
networks and cheaper energy bills.
As mostly an user, who helps with bug reporting / triaging from time to
time, yet only did one or two commits so far – regarding a performance
issue in Akonadi's maildir resource –, I like to bring up a topic that
from a quick skimming through your talk, Cornelius, I think was not
mentioned explicitly so far. I just quickly skimmed through it instead
of watching it fully, so I could have missed it.
I think it is good and important to measure the energy efficiency of
applications like Okular, KMail and Krita.
But there is an aspect that I think would be important to include within
the measurement or measure separately. That is that of the background
services required to keep applications working and that of general
background services like desktop search. Especially when it comes to
indexing and searching. I do not have global numbers, but I wouldn't be
surprised if the energy used for indexing the internet would be
enormous. And every efficiency gain there would have a huge effect.
Similar for the desktop. Users often enough have lots of mails and lots
of files. The work to index those is done on a huge lot of user systems
so there is quite a multiplication factor.
Just an example from my new ThinkPad T14 Gen 1 AMD with AMD Ryzen 7 PRO
4750U, 32 GiB of RAM and a Samsung 980 Pro 2 TB NVMe SSD:
% du -sh .local/share/akonadi/search_db .local/share/local-mail .local/
share/akonadi/file_db_data .local/share/akonadi/db_data
8,8G .local/share/akonadi/search_db
17G .local/share/local-mail
6,2G .local/share/akonadi/file_db_data
3,3G .local/share/akonadi/db_data
(Instead of several usually empty IMAP accounts, cause I move mails to
local folders after reading, and one IMAP account which holds 2 weeks of
mails for smartphone access, that I configured in KMail in order to
create Sieve rules, all mails are stored locally)
Akonadi Indexing Agent is still not finished with indexing it. Akonadi
moves all files through the Akonadi cache, i.e. either PostgreSQL
database and/or file-based storage, in order to index them. This is
eating up 2-4 CPU cores since hours. 2-3 of them for PostgreSQL related
processes. In addition to that there is considerable disk usage which
keeps that fast NVMe SSD, granted with BTRFS on top of LVM on top of
LUKS – thus additional processing for encryption required –, at 80-100%
busy most of the time writing about 300-400 MiB every 10 seconds in
about 10000-20000 write operations every 10 seconds according to atop. I
did not look at atopsar reports so it may not be completely constant,
but whenever I looked atop, I saw this kind of workload going on.
Daniel Vrátil worked previously on fixing that with his work to make
indexing great again¹. However as I understand he eventually kind of
burned out.
Another example is Baloo:
% LANG=en balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 1,191,389
Files waiting for content indexing: 278,274
Files failed to index: 0
Current size of index is 8.12 GiB
In this case Baloo apears to be indexing the same files a *third* time
meanwhile. It relies on device numbers being stable across reboots which
is not guaranteed by the Linux kernel². I suspended it in the hope that
the Akonadi mail indexing of mostly locally stored mails would
eventually finish. The machine would have more than enough CPU power to
do the file indexing in parallel, but I would like to ease load on /home
for the time being. Usually I just removed execute permissions from /
usr/bin/akonadi_indexing – but this time I hope this new laptop is
powerful enough so that Akonadi would eventually complete the task.
I did not yet fully learn how you monitor application energy
consumption, how you determine which processes to monitor, I just
skimmed through your presentation instead of watching it fully. But I
kindly ask you to consider the considerable amount of background
activity regarding to indexing user data and monitor the resource
consumption from that as well.
I understand that making indexing work efficiently can be a hard task to
solve. I do not have exact numbers, yet I think that indexing user data
is one of the major contributors to energy and other resource
consumption in KDE. The laptop is basically venting the fan at 3600 to
4000 rpm basically constantly since some time yesterday when it started
indexing mails. This may be an over cautious fan activity by default
firmware, however often enough when really idle the fan is not active at
all.
I know this topic has been brought up by users over and over again. But
I think the main reason for that is, that the issue is still not fixed. I
bet it is no low-hanging fruit and I hope that eventually someone with
enough in-depth knowledge on the involved technologies *and* enough time
would work on fixing these issues. I'd be willing to spend some money on
that work – similar like a Krita fundraiser.
[1] Make Indexing Great Again
https://phabricator.kde.org/T7014
[2] Bug 438434 - Baloo appears to be indexing twice the number of files
than are actually in my home directory
https://bugs.kde.org/show_bug.cgi?id=438434
Best,
--
Martin
More information about the Kde-eco-discuss
mailing list