Gross inefficiency in kmail

George Staikos staikos at kde.org
Sat Mar 20 07:01:05 CET 2004


   After fixing the huge memory leaks in kmail (dimap/imap related), I still 
found that it was far too slow.  It's using 100% cpu and takes a very long 
time to check mail on a fast machine with a fast connection.  On  a dual 
Athlon 2400 connected to the mail server with 100bTX and ~10-12 thousand 
messages in the account, a single dimap sync (no cache rebuild, a sync with 
at most 5-10 new messages) under calltree takes 1.5 hours to complete.  It 
takes many seconds (over a minute or two?) to complete in real time even.

   So as you have now guessed, I did a bit of profiling and I can see where 
the inefficiencies are.  A single check of my dimap account results in 800 
million (800000000) memcpy(), 22 million (22000000) calls to KMMsgInfo::UID() 
(and getLongPart()), and typically 32000 messages from the slave.  My concern 
right now is the memcpy.  It illustrates what I'm interpreting as this 
behaviour:

- Remote folder index (RFI)
- Local folder index (LFI)

IMAP find messages to sync remotely from RFI, gets message over imap, does a 
lookup locally in LFI.  It starts at message index 0 and works up through the 
folder until it finds it, returns.  This is n^2 behaviour since the folder on 
the remote side typically is close to identical to the local folder, and at 
least non-constant in size.  It would be worse if the local and remote 
folders are disjoint and large.

I'm sure there are many techniques to solve this problem.  Some of them might 
including:

   - building a cache of the folder index in memory for these sorts of 
operations.  I think this would require up to 1MB of RAM for a folder of 
35000 messages.  It's not too expensive in the scope of KDE PIM applications, 
and a good allocator would go a long way

  - being more creative with the index pointer.  If it is likely that 
operations will be sequential (as it is in dimap), if we start where we left 
off last, it is likely we will do far less iteration

   - use a better data structure so that finding UIDs is log n time or better

   - pay careful attention to detailed micro optimizations in these inner 
loops.  800000000 calls is rather rediculous for this operation, but it is 
nonetheless an expensive operation.  Every cycle helps here.

   If anyone wants to see the cachegrind log of this behaviour I can send a 
copy via private email.  It's quite large of course.  Now the big question, 
does anyone want to tackle fixing this?  My memory of the internals of KMail 
differ quite a bit from what I see there now.  I think this is a task for 
someone who knows this stuff well.  One suggestion of mine is to create a 
method in KMFolderIndex to find a message quickly by UID.


-- 
George Staikos
KDE Developer				http://www.kde.org/
Staikos Computing Services Inc.		http://www.staikos.net/


More information about the Kde-optimize mailing list