[Kde-pim] PIM indexing and search - taking stock
Will Stephenson
wstephenson at kde.org
Wed Mar 14 17:05:05 GMT 2012
Hi all
I'd like to share this list of the issues we face and have dealt with in
indexing and searching PIM data (mostly mails as their volume creates the most
difficulty). You'll see that we have made quite a lot of progress since the
meeting. Notice that I've written this list taking a high level usefulness of
product viewpoint; it's useless having working indexing if it's not possible
to use the index, additionally, there is no point indexing data that will
never be searched.
My priority is to get kdepim 4.8 indexing usable, or at least not a liability,
so I'll be backporting any fixes below that are not yet in the branch.
I'll put this on the wiki, but please let me know here if you spot any
inconsistencies or omissions. I'll also make bug reports for the searching
problems I've identified.
Will
1 Faults in indexing
1.1 Performance faults while indexing
1.1.1 FIXED Excessive work per item
* Excessive queries per item kde#289932#c58 [1],
kde#289932#c87 (754275eda610dce1160286a76339353097d8764c in kde-runtime/4.8)
* Attachments fetched but not effectively indexed (Volker WIP?)
* Setting the same icons on mails, attachments their tags while indexing, is
this necessary? (No, commented in other feeders - CM) Can they be de-
duplicated before storing the mail resource?
1.1.2 Repeated indexing per item
1.1.2.1 Failures to index items
* FIXED Cardinality fault on messageHeader
http://oscaf.git.sourceforge.net/git/gitweb.cgi?p=oscaf/shared-desktop-
ontologies;a=commitdiff;h=4697389c39b7112aaf0f6ac1a36b216e78ab5e14
* FIXED Cardinality fault on PIMO:Persons' properties
d732592b in kde-runtime/master
1.1.2.2 FIXED Redundant reindexing
* kde#289932#c58?
1.1.3 Repeated indexing per collection
* FIXED Attempted indexing of collections we cannot index
ec4f19eb781514ce0dfc09fe4e9ea4591ecc31e9 in kdepim-runtime/4.8
* FIXED Mark each collection on completion with indexing level
2729771b765d0bd6e0e03d0a5b055e36bc48944c in kdepim-runtime/master
(does this prevent discovery of items changed while feeder was not running?)
1.1.4 Indexing interferes with other work
* FIXED Hide indexing until user is idle kde#289932#c58
1.1.5 Low nominal performance
* Eg. 5700 (42MB mbox) kde-core-devel mails in 20 minutes (4.8 items/sec) on
Core i7-2620M (4x2.7GHz, HT), idle detection disabled. Not clear what is the
bottleneck. Virtuoso using 80-90% of one core during this.
* Akonadi->feeder->dbus->nepomukstorage->virtuoso of all mail negates
performance advantage of fast Akonadi protocol
2 Ability to utilise indexing work (working search)
2.1 Search features that fully use indexed data
* Indexed: Date, Subject, From, Sender, To, Cc, Bcc, List-Id, Organization,
some X-headers, Status flags, Tags, Important, Todo, Watched, Plain text body
Searchable: Age(days), Subject, From, To, Cc, Reply-To, List-Id, Organization,
some X-headers, Status flags, Tags, all headers (can this work?), message body
* No way to search by the actual PIMO Persons/Contacts created by indexing,
user must input part of name.
* No way to search attachments or whether something has an attachment
2.2 Faults in search
2.2.1 Server side
* FIXED Truncated query strings cause broken search folders
2.2.2 Client side
* Dialog allows modifying existing search folder by name but fails (modifies
remote id)
* Possible to create search in search folders; doesn't work
2.2.3 Viewing search results changes search results
* search on unread message status, messages disappear from search as message
preview makes them read
* Just viewing search results causes some messages to disappear from search
collection (according to akonadiconsole db browser, itemChanged + reindex at
fault?)
2.3 Minimising indexing work (assuming there is no/low demand for search, do
less expensive indexing)
* Change default set of indexed folders
* Make it easy to change per folder indexing attribute
* Show indexing status, allow attr change directly in folder selector in
search dialog.
* Indexing all except full text a useful compromise?
[1] https://bugs.kde.org/show_bug.cgi?id=289932
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list