[Nepomuk] [Kde-pim] Support for efficient searching in akonadi (e.g. by UID)

Volker Krause vkrause at kde.org
Sat May 18 09:01:00 UTC 2013


Hi,

On Thursday 16 May 2013 16:40:15 Christian Mollekopf wrote:
> The two primarily used standards for PIM data, vcard and ical, both assign a
> globally unique UID to each object, and naturally make references based on
> them.

Message-Id of email/news is similar.

> Searching an item by UID, e.g	. to resolve such a reference, currently
> involves parsing the payload of each item to extract the UID, which is very
> inefficient.
> 
> I think we should support searching through UID's with the help of the db,
> by moving it into a dedicated column/table.

yes, this has been discussed before ("GID FETCH", using the term "global id" 
since UID already means something different in the Akonadi context).

> This would for example be useful for todo/subtodo trees or contact groups
> using references.

as well as invitation handling, and could also help with message threading and 
"mark copies/cross-posts as read" (KNode has that already).

> To support this we'd need a UID-itemId mapping somewhere in the db.
> This mapping should be optional to avoid having to generate a UID for items
> where it's not useful (it's a cache for a part of the payload really).

right, it needs to be optional (ie. empty is allowed), and it also does not 
need to be unique (you can "see" the same item multiple times).

> I think there are two possible options where we could add this to the db:
> * a dedicated table
> * a special part

There's a third: an extra (indexed) column for the item, and methods for 
setting/getting this in Akonadi::Item.

> The latter would be less efficient, because an index would also contain all
> the full payloads, and there are of course many more parts in the db than
> there are items, but would add less complexity to the design.

we can't index parts without adding a real full-text index, and that would be 
largely pointless since most parts are encoded in formats that aren't 
necessarily human readable.

> To ensure that the UID is always up to date if it is being used, I think the
> right place to update this cache would be the ItemSerializerPlugin, either
> by using a part or by extending the serializer interface.

The serializer is indeed a good place to handle this, also for the third 
option mentioned above.

This would also include a new version of the FETCH command ("GID", next to the 
existing UID, RID and HRID ones). It can return zero, one or multiple results, 
which would be up to the user to handle.

All this shouldn't even be too much work to add. And it's generic enough to be 
useful for all kinds of types. As mentioned above, we discussed this before 
and agreed it's a good idea to have this, just needs a volunteer to do the 
work ;)

> Note that while I talked only about UID's so far we may want to cache other
> parts of the payload in the future in order to be searchable. For instance,
> to be able to load only calendar objects which occur within a certain
> timeframe, we could cache start and end date (I know in this specific case
> that we have a performance problem for large calendars, and this would be
> one way to reduce this). So the UID table could also be a CACHE table
> containing an additional TYPE column (same if cached in special parts).

Here I have to disagree. This is getting way too close to Nepomuk. Object 
identification and id mapping is in scope for Akonadi, understanding content/ 
payload semantics is not.

The calendar performance use-case is very valid of course, but also not 
entirely new (there should be code for an unfinished calendar search agent 
somewhere). The rationale back then was that the ical time range query problem 
is actually more complex (and can also not easily covered by Nepomuk) to 
warrant a specialized solution (consider timezones, recurrences, etc). Also, 
lack of indexing leads "only" to poor performance, not to complete lack of a 
feature (as it's the case with the above listed use-cases for GID FETCH).

regards,
Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130518/b45c106e/attachment.sig>


More information about the Nepomuk mailing list