[Nepomuk] Support for efficient searching in akonadi (e.g. by UID)

Christian Mollekopf chrigi_1 at fastmail.fm
Thu May 16 14:40:15 UTC 2013


Hey,

The two primarily used standards for PIM data, vcard and ical, both assign a 
globally unique UID to each object, and naturally make references based on 
them.

Searching an item by UID, e.g	. to resolve such a reference, currently involves 
parsing the payload of each item to extract the UID, which is very inefficient.

I think we should support searching through UID's with the help of the db, by 
moving it into a dedicated column/table.

This would for example be useful for todo/subtodo trees or contact groups 
using references.

To support this we'd need a UID-itemId mapping somewhere in the db.
This mapping should be optional to avoid having to generate a UID for items 
where it's not useful (it's a cache for a part of the payload really).

I think there are two possible options where we could add this to the db:
* a dedicated table
* a special part

The latter would be less efficient, because an index would also contain all the 
full payloads, and there are of course many more parts in the db than there 
are items, but would add less complexity to the design.

To ensure that the UID is always up to date if it is being used, I think the 
right place to update this cache would be the ItemSerializerPlugin, either by 
using a part or by extending the serializer interface.

Note that while I talked only about UID's so far we may want to cache other 
parts of the payload in the future in order to be searchable. For instance, to 
be able to load only calendar objects which occur within a certain timeframe, 
we could cache start and end date (I know in this specific case that we have a 
performance problem for large calendars, and this would be one way to reduce 
this). So the UID table could also be a CACHE table containing an additional 
TYPE column (same if cached in special parts).

Implementing this would involve:
* potentially adjust the db to cache the property (if not using parts)
* support the cached properties in the akonadi protocol (if not using parts)
* Add a search job which would allow to search by any cached property.

I also don't think that this conflicts with the goals of nepomuk.
The searching proposed here are pure perofrmance improvements, and don't 
provide any of nepomuks primary features:
* extensive relations between objects
* a hopfully eventually powerful fulltext search engine ;-)

All I wan't from akonadi is that it supports what we need for the basic 
features, such as displaying content (within a reasonable amount of time and a 
shitload of data at it's hands).

Objections, Ideas, Comments?

Cheers,
Christian








More information about the Nepomuk mailing list