[Kde-pim] PIM indexing and search - taking stock

Will Stephenson wstephenson at kde.org
Wed Mar 14 17:05:05 GMT 2012


Hi all

I'd like to share this list of the issues we face and have dealt with in 
indexing and searching PIM data (mostly mails as their volume creates the most 
difficulty).  You'll see that we have made quite a lot of progress since the 
meeting.  Notice that I've written this list taking a high level usefulness of 
product viewpoint; it's useless having working indexing if it's not possible 
to use the index, additionally, there is no point indexing data that will 
never be searched.

My priority is to get kdepim 4.8 indexing usable, or at least not a liability, 
so I'll be backporting any fixes below that are not yet in the branch.

I'll put this on the wiki, but please let me know here if you spot any 
inconsistencies or omissions. I'll also make bug reports for the searching 
problems I've identified.

Will

1 Faults in indexing
1.1 Performance faults while indexing
1.1.1 FIXED Excessive work per item
* Excessive queries per item kde#289932#c58 [1], 
kde#289932#c87 (754275eda610dce1160286a76339353097d8764c in kde-runtime/4.8)
* Attachments fetched but not effectively indexed (Volker WIP?)
* Setting the same icons on mails, attachments their tags while indexing, is 
this necessary? (No, commented in other feeders - CM) Can they be de-
duplicated before storing the mail resource?
1.1.2 Repeated indexing per item
1.1.2.1 Failures to index items
* FIXED Cardinality fault on messageHeader 
http://oscaf.git.sourceforge.net/git/gitweb.cgi?p=oscaf/shared-desktop-
ontologies;a=commitdiff;h=4697389c39b7112aaf0f6ac1a36b216e78ab5e14
* FIXED Cardinality fault on PIMO:Persons' properties
d732592b in kde-runtime/master
1.1.2.2 FIXED Redundant reindexing 
* kde#289932#c58? 
1.1.3 Repeated indexing per collection
* FIXED Attempted indexing of collections we cannot index 
ec4f19eb781514ce0dfc09fe4e9ea4591ecc31e9 in kdepim-runtime/4.8
* FIXED Mark each collection on completion with indexing level 
2729771b765d0bd6e0e03d0a5b055e36bc48944c in kdepim-runtime/master
(does this prevent discovery of items changed while feeder was not running?)
1.1.4 Indexing interferes with other work
* FIXED Hide indexing until user is idle kde#289932#c58
1.1.5 Low nominal performance
* Eg. 5700 (42MB mbox) kde-core-devel mails in 20 minutes (4.8 items/sec) on 
Core i7-2620M (4x2.7GHz, HT), idle detection disabled. Not clear what is the 
bottleneck.  Virtuoso using 80-90% of one core during this.
* Akonadi->feeder->dbus->nepomukstorage->virtuoso of all mail negates 
performance advantage of fast Akonadi protocol

2 Ability to utilise indexing work (working search)
2.1 Search features that fully use indexed data 
* Indexed: Date, Subject, From, Sender, To, Cc, Bcc, List-Id, Organization, 
some X-headers, Status flags, Tags, Important, Todo, Watched, Plain text body
Searchable: Age(days), Subject, From, To, Cc, Reply-To, List-Id, Organization, 
some X-headers, Status flags, Tags, all headers (can this work?), message body
* No way to search by the actual PIMO Persons/Contacts created by indexing, 
user must input part of name.
* No way to search attachments or whether something has an attachment

2.2 Faults in search
2.2.1 Server side
* FIXED Truncated query strings cause broken search folders
2.2.2 Client side
* Dialog allows modifying existing search folder by name but fails (modifies 
remote id)
* Possible to create search in search folders; doesn't work
2.2.3 Viewing search results changes search results 
* search on unread message status, messages disappear from search as message 
preview makes them read
* Just viewing search results causes some messages to disappear from search 
collection (according to akonadiconsole db browser, itemChanged + reindex at 
fault?)

2.3 Minimising indexing work (assuming there is no/low demand for search, do 
less expensive indexing)
* Change default set of indexed folders
* Make it easy to change per folder indexing attribute
* Show indexing status, allow attr change directly in folder selector in 
search dialog.
* Indexing all except full text a useful compromise?

[1] https://bugs.kde.org/show_bug.cgi?id=289932



_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/



More information about the kde-pim mailing list