[Kde-pim] PIM indexing and search - taking stock

Ingo Klöcker kloecker at kde.org
Wed Mar 14 20:28:32 GMT 2012


On Wednesday 14 March 2012, Will Stephenson wrote:
> Hi all
> 
> I'd like to share this list of the issues we face and have dealt with
> in indexing and searching PIM data (mostly mails as their volume
> creates the most difficulty).  You'll see that we have made quite a
> lot of progress since the meeting.

That's great news!


> Notice that I've written this
> list taking a high level usefulness of product viewpoint; it's
> useless having working indexing if it's not possible to use the
> index, additionally, there is no point indexing data that will never
> be searched.
> 
> My priority is to get kdepim 4.8 indexing usable, or at least not a
> liability, so I'll be backporting any fixes below that are not yet
> in the branch.
> 
> I'll put this on the wiki, but please let me know here if you spot
> any inconsistencies or omissions. I'll also make bug reports for the
> searching problems I've identified.
> 
> Will
> 
> 1 Faults in indexing
> 1.1 Performance faults while indexing
> 1.1.1 FIXED Excessive work per item
> * Excessive queries per item kde#289932#c58 [1],
> kde#289932#c87 (754275eda610dce1160286a76339353097d8764c in
> kde-runtime/4.8)
> * Attachments fetched but not effectively indexed
> (Volker WIP?)
> * Setting the same icons on mails, attachments their
> tags while indexing, is this necessary? (No, commented in other
> feeders - CM) Can they be de- duplicated before storing the mail
> resource?

I do not understand what you are trying to say.


> 1.1.2 Repeated indexing per item
> 1.1.2.1 Failures to index items
> * FIXED Cardinality fault on messageHeader
> http://oscaf.git.sourceforge.net/git/gitweb.cgi?p=oscaf/shared-deskto
> p- ontologies;a=commitdiff;h=4697389c39b7112aaf0f6ac1a36b216e78ab5e14
> * FIXED Cardinality fault on PIMO:Persons' properties
> d732592b in kde-runtime/master
> 1.1.2.2 FIXED Redundant reindexing
> * kde#289932#c58?
> 1.1.3 Repeated indexing per collection
> * FIXED Attempted indexing of collections we cannot index
> ec4f19eb781514ce0dfc09fe4e9ea4591ecc31e9 in kdepim-runtime/4.8
> * FIXED Mark each collection on completion with indexing level
> 2729771b765d0bd6e0e03d0a5b055e36bc48944c in kdepim-runtime/master
> (does this prevent discovery of items changed while feeder was not
> running?)
> 1.1.4 Indexing interferes with other work
> * FIXED Hide indexing until user is idle kde#289932#c58
> 1.1.5 Low nominal performance
> * Eg. 5700 (42MB mbox) kde-core-devel mails in 20 minutes (4.8
> items/sec) on Core i7-2620M (4x2.7GHz, HT), idle detection disabled.
> Not clear what is the bottleneck.  Virtuoso using 80-90% of one core
> during this.

Sounds like Virtuoso is doing stuff it probably shouldn't (need to) do. 
Do you have any idea what it is doing? Anything in the logs?

At work we did have serious problems with Postgres. In the end it turned 
out that it was our own fault. We were using random UUIDs instead of 
sequential UUIDs. Obviously, there is an index on the UUIDs. Since the 
UUIDs were random inserting lots of items made Postgres re-order the 
complete index B-tree all of the time. A colleague finally found the 
root cause of our problems by simulating our usage of the database.

Of course, the (non-)sequential UUIDs are just a shot in the dark 
because Virtuoso is a completely different beast than Postgres. But 
unless we can get any useful logging/debugging information from Virtuoso 
we should try to get down to the actual problem with simulations.


> * Akonadi->feeder->dbus->nepomukstorage->virtuoso of all mail negates
> performance advantage of fast Akonadi protocol
> 
> 2 Ability to utilise indexing work (working search)
> 2.1 Search features that fully use indexed data
> * Indexed: Date, Subject, From, Sender, To, Cc, Bcc, List-Id,
> Organization, some X-headers, Status flags, Tags, Important, Todo,
> Watched, Plain text body
> Searchable: Age(days), Subject, From, To,
> Cc, Reply-To, List-Id, Organization, some X-headers, Status flags,
> Tags, all headers (can this work?), message body
> * No way to search
> by the actual PIMO Persons/Contacts created by indexing, user must
> input part of name.
> * No way to search attachments or whether something has an attachment
> 
> 2.2 Faults in search
> 2.2.1 Server side
> * FIXED Truncated query strings cause broken search folders
> 2.2.2 Client side
> * Dialog allows modifying existing search folder by name but fails
> (modifies remote id)
> * Possible to create search in search folders; doesn't work
> 2.2.3 Viewing search results changes search results
> * search on unread message status, messages disappear from search as
> message preview makes them read
> * Just viewing search results causes some messages to disappear from
> search collection (according to akonadiconsole db browser,
> itemChanged + reindex at fault?)
> 
> 2.3 Minimising indexing work (assuming there is no/low demand for
> search, do less expensive indexing)
> * Change default set of indexed folders
> * Make it easy to change per folder indexing attribute
> * Show indexing status, allow attr change directly in folder selector
> in search dialog.
> * Indexing all except full text a useful compromise?

My day-to-day experience with Thunderbird shows that I'm mostly 
searching by
- Subject
- Sender
- Recipient
- Date

A couple of times I used full text search when I was unable to guess 
Subject/Sender/Recipient of the message I was looking for.

I should note that by far most of the time I'm locating messages by 
looking into the folder were I think I put the message and then using 
the quick filter. So, I'm probably not representative for Joe Average, 
but I am probably representative for Jim "I have 200 folders and sort my 
100+ K mails (mostly) by hand" Poweruser.


Regards,
Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20120314/c5939e49/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list