[kmail2] [Bug 301470] Threading very slow

Sat Sep 12 00:19:41 BST 2015

https://bugs.kde.org/show_bug.cgi?id=301470

--- Comment #4 from Bernd Paysan <bernd.paysan at gmx.de> ---
Am Freitag, 11. September 2015, 21:36:26 schrieb Martin Steigerwald:
> https://bugs.kde.org/show_bug.cgi?id=301470
> 
> --- Comment #3 from Martin Steigerwald <Martin at Lichtvoll.de> ---
> Note that it will take till December for 15.12 to be released. Current git
> master, buildable with kdesrc-build has quite some improvements already. But
> of course you can also wait for 15.12 to appear in your distro.

Due to the slow speed, I've split up my folders; the largest one has 3500 
deeply threaded messages in it; at the moment (with a 3GHz Core i7, Kmail 
4.14.9), it takes 6 seconds to completely thread in interactive mode, and 
4.something in batch mode - you can expect that a slow machine with an Atom or 
so will take a minute or more for that task, which is certainly not tolerable.  
That's the only folder I've left which is somewhat slow; it used to be worse 
than that.  If you want to get some better feedback, I suggest you add a stop 
timer while threading and display the final timing after "Ready", bottom left.

I've some mailing lists which accumulate deeply threaded messages relatively 
quickly and will let them grow until I get 15.12.  If you want to get a folder 
full with lots of deeply threaded messages, just subscribe to the Linux kernel 
mailing list ;-).

I suspect the current threading code in 4.14.9 is at least O(n²) or worse, 
because all folders somewhat below 2000 messages are threaded fast enough.  If 
you need help with an O(n) algorithm for threading messages, I propose the 
following - requires only two linear walk through the list of messages:

First walk: Build a hash message id->message, so you can access each message 
by its id.  That walk you need to do only once when loading the messages from 
the backend.

Second walk: Go through the list of referenced message ids, and if you find a 
message, append it in the "nested messages" array/vector of that message 
object.  If there's no reference found, append it to the top-level 
array/vector.  Choose a data structure where append is an O(1) function.

Next: Sort the arrays from bottom (1st level messages) with the selected sort 
order, and walk the threading tree downwards to sort the sub-lists.  This is 
O(n log(n)).  To make it easier for displaying, in the return path of the tree 
walk, calculate the number of messages in each sub-thread, and annotate them 
with their absolute position in in the message list and their nesting depth 
(that information is easily available in a tree walk).

The last walk is then ready to display.

IMHO it should at most take some hundred microseconds to thread a few thousand 
messages that way, where the headers are all in memory, and selection of the 
right data structures.

-- 
You are receiving this mail because:
You are the assignee for the bug.