[Akonadi] [Bug 338402] File system cache is inneficient : too many file per directory

Martin Steigerwald ms at teamix.de
Wed Jan 21 09:07:53 GMT 2015


https://bugs.kde.org/show_bug.cgi?id=338402

Martin Steigerwald <ms at teamix.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ms at teamix.de

--- Comment #4 from Martin Steigerwald <ms at teamix.de> ---
Daniel, no, the argument that filesystems can handle it, well, with a BTRFS
Dual RAID 1 on two SSD, maybe, but the main issue is that it caches that much
at all. My bet is:

The usual user has no more than about several thousand mails in short term
reference. Even a photo reader wouldn´t have much more in reference at the same
time, I bet.

So I fail to find the benefit of caching hundreds of thousand of mails in
there. Is there any cache hit/miss statistics? I bet the statistic would be
abysmal.

So why does it cache that much *at all*?

This is a huge big "inefficient" looking into your face. And no argument
whatsoever will make it go away. No amount of denial that this is just insane,
will make it less so. Thats at least my oppinion on it.

At work with the huge IMAP account – on the laptop, I am at home office, so
can´t check the workstation, but it was exact that workstation where I moved
Akonadi stuff to a local filesystem cause a NetApp Filer storage appliance
default limit for maximum files in a directory was exceeded, I was the one
reporting that bug:

Before akonadictl fsck I had this:

ms at merkaba:~/.local/share/akonadi> ls -ld file_db_data
drwxr-xr-x 1 ms teamix 109247040 Jan 21 09:23 file_db_data

ms at merkaba:~/.local/share/akonadi> find file_db_data | wc -l 
650280

After it I have:

ms at merkaba:~/.local/share/akonadi> ls -ld file_db_data
drwxr-xr-x 1 ms teamix 109247040 Jan 21 09:32 file_db_data

ms at merkaba:~/.local/share/akonadi#130> find file_db_data | wc -l
524030

So at least the amount of files went down a bit. Even on my POP3 setting I had
lots of files in there and even after fsck I had 4600 mails in there. I have
local maildir´s for a reason, I´d say and they are fast to access.

But for the work case 524030 cached mails, Dan, can you explain to me the
benefit of that? Do you really think I care about 500000+ mails at once? Its an
archive of mailing list, yeah, I like full text search there, but Baloo ideally
needs to see each mail just once, so why cache those at all? Cache the recent
mails of the few folders the user accesses most often and be done with it, I´d
say.

Even despite this insane amount of caching I still get situation that KMail
does not even respond at all anymore. Granted this is with a huge account and
with an Exchange cluster and their IMAP implementation is abysmal, but still, I
have 500000+ mails cached locally, *without* setting the IMAP account to
disconnected, so why am I not even able to see these then? And why does it come
into a situation where I have to restart Akonadi and/or KMail to be able to
actually *use* the mail account again and have KMail do something useful? That
are real issues with Akonadi not only I have.

So for me its just a *huge* waste of resources without *any* obvious benefit.
The issues with Akonadi is not a lack of caching. The issues users still have
with Akonadi are elsewhere.

And caching that much can be an issue for BTRFS, even on said Dual SSD RAID 1:

ms at merkaba:~/.local/share/akonadi#130> /usr/bin/time -v du -sh file_db_data
7,0G    file_db_data
        Command being timed: "du -sh file_db_data"
        User time (seconds): 2.17
        System time (seconds): 99.07
        Percent of CPU this job got: 31%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:17.95
                                                                          
^^^^^^^

This is no joke: The du -sch took more than 5 (in words: five minutes) on a
BTRFS Dual SSD RAID 1!

Granted the find was way faster (some seconds) and as Akonadi I bet doesn´t
count the space in the directory… but still. There is some overhead involved.

        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 33240
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 8076
        Voluntary context switches: 663116

Thats more than 600000 context switches.

        Involuntary context switches: 17704
        Swaps: 0
        File system inputs: 31424208

Thats 31 million file system read requests!

        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Boy, 7 GiB! It cached almost my complete IMAP account here. Without me asking
to do it.

Well heck, with that, I am inclined to actually make it an offline IMAP
account, cause honestly, where is the difference? And maybe it would work with
that Exchange IMAP that lacks both in performance and features implementation
(not only with IMAP, also with Trojita). I think, I try this, cause, well it
already downloaded it almost completely. I think I will just set this flag now,
cause, if it downloads 7 GiB from it, I don´t care about the remaining few GiB
it may have not downloaded. I wonder how this all can work for Munich, by the
way.

Please, have a way to *limit* the caching to a *sane* value. Or limit it by
default.

Thats not sane. Prove me otherwise!

I´d say:

1) On regular and fast IMAP just cache several thousand of mails.

2) On Exchange do that as well and let users switch to disconnected IMAP if
Exchange just can´t keep up. Heck, maybe all this caching contributes to making
Exchange slow, cause KMail at one point downloaded all these mails. But yeah,
one downloading I understand, for the useful desktop search, but then why
bother caching all these? Can the downloads for Baloo just be excluded from
caching altogether? I´d only cache user requests. There, latency is important.
Ideally Baloo should see each mail once, so why cache?

Compared to that file cache the size of the database seems pretty low, but
still:

ms at merkaba:~/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head
-10
2,6G    parttable.ibd
261M    pimitemtable.ibd
13M     pimitemflagrelation.ibd
264K    collectionattributetable.ibd
200K    collectiontable.ibd
136K    tagtable.ibd
120K    tagtypetable.ibd
120K    tagremoteidresourcerelationtable.ibd
120K    tagattributetable.ibd
120K    resourcetable.ibd

I think I will vacuum it as well.

Akonadi didn´t even respond to the first vacuum request, I restarted Akonadi
and issued the request again to make it actually do it.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Kdepim-bugs mailing list