[Bug 301470] New: Threading very slow

Bernd Paysan bernd.paysan at gmx.de
Fri Jun 8 20:53:48 BST 2012


https://bugs.kde.org/show_bug.cgi?id=301470

            Bug ID: 301470
          Severity: normal
           Version: 4.8.3
          Priority: NOR
          Assignee: kdepim-bugs at kde.org
           Summary: Threading very slow
    Classification: Unclassified
                OS: Linux
          Reporter: bernd.paysan at gmx.de
          Hardware: openSUSE RPMs
            Status: UNCONFIRMED
         Component: message list
           Product: kmail2

I've a lot of E-mails.  I don't know how much you communicate, but it seems to
be that the Kmail2 algorithm for threading the message list is not only
O(n^(something>1)), but also other fundamental mistakes have been made.  I want
to make a constructive comment on this:

Ok, first of all, threading is an O(n) algorithm.  You have n messages to walk
through.  They are identified by the Message-Id header field.  You use this as
hash index to store the message list in a hash table.  Now you thread the
messages, which converts the list into a tree.  You use the References header
field.  This field can have more than one reference, but the first hit is the
"parent node".  So you insert the message as child into the first parent node
you find, and be done with it.  After having walked through all messages (n
items), you have a structured tree.  While inserting, you can go up each
message thread to the top, and update a min/max date field (needed for sorting)
there.

After you have created the tree, sort it by the criteria the user has selected.
 These are things like "by first message in thread" or "by last message in
thread", or "into buckets of days/weeks", which will tear the trees apart (as
only the intra-day threading is taken into account).  This sorting is O(n log
n), by nature of good sorting algorithms.

So, if you are done with sorting, *then* you start to actually create widgets
for the messages.  No earlier.  And you do that *only* for the elements which
are currently visible in the message list pane.  Not for all the other 10k
messages, which are invisible.

At the moment, using Kmail2 is no fun.  A few years ago, Kmail used to be fun,
because it was fast and reliable - and that was on hardware which was much
slower than today's.  Handling mail directories with 10k entries was no
problem.  The problems with threading started before it became Kmail2, but it
got worse over time.  And now I'm fed up enough to report this as a bug.  It is
"normal", because it does not actually crash, but it is no feature request. 
This is a must for a user interface - not be sluggish and unresponsive.

To reproduce, subscribe to the Linux kernel mailing list or any other
high-volume mailing list.  Wait a few days, until you hit a few thousand
messages.  Then try to use Kmail2 on this mailing list.  If it takes more than
a millisecond to thread and sort this amount of data on a contemporary
processor, the program is broken.  Wait another few days to have the number of
messages doubled.  If the time it takes know to thread and sort is
significantly more than two times larger, the program is broken, as well.

A similar issue exists with Dolphin viewing folders with that many items.  I've
reports of people who tried with a 30k entry folder, and it took a whole week
for Dolphin until it showed the files.  Please, this is first semester computer
science stuff - don't program O(n²) or worse, if there is an O(n) algorithm.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Kdepim-bugs mailing list