[Nepomuk] Search Problems cause of annoying ontologies

Thu Jan 3 14:37:33 UTC 2013

On Tuesday 18 December 2012 17.15:32 Vishesh Handa wrote:
> Hey everyone
> 
> In Akonadi, they have a very common problem where they need to do a full
> text search across a number of properties and find the associated contact.
> 
> The properties are -
> * nco:fullname
> * nco:nameGiven
> * nco:nameFamily
> * nco:emailAddress
> 
> The problem is obviously that a nco:PersonContact unfortunately cannot have
> a nco:emailAddress. The EmailAddress must be a resource which then has the
> property nco:emailAddress which contains the email.
> 
> Theoretically this makes a lot of sense cause an EmailAddress is a
> nco:ContactMedium. So one could write a query to iterate all the possible
> ways to contact a query, and one would get the email id.
> 
> Practically, this sucks. Cause the query requires an extra join + union and
> gets slowed down significantly.
> 
> select distinct ?r where {
> {
>    ?r ?p ?o .
>    FILTER( ?p in (nco:nameGiven, nco:fullname, nco:nameFamily)  ) .
>    ?o bif:contains "whatever" .
> }
> UNION
> {
>    ?r nco:hasEmailAddress ?e .
>    ?e bif:contains "whatever" .
> }

What happens if you:

select distinct ?r where {
 {
    ?r ?p ?o .
    FILTER( ?p in (nco:nameGiven, nco:fullname, nco:nameFamily, 
nco:hasEmailAddress)  ) .
    ?o bif:contains "whatever" .
 }

and then check the type of the result?

> 
> This is a general problem all across Nepomuk where the ontologies (like a
> db schema) are fully normalized, and hence require one extra traversal to
> go to that object and get its property. In virtuoso this amounts to an
> extra join.
> 
> Another example is searching for a song given its album, name, and artist's
> name. The query is horrible and takes over 18 seconds on my system (yeah,
> we are horrible at our main job - searching). Unfortunately, in this case
> we have a proper reason for splitting the data. In the Akonadi case there
> isn't much of reason.
> 
> My suggestion to fix the Akonadi problem is either relaxing the condition
> for nco:emailAddress  or double typing the nco:PersonContact as an
> nco:EmailAddress. Both of which are very ugly.
> 
> Does anyone else have a good solution?

Nope, but you could add a "cache" property. That would duplicate the data, but 
you could handle that duplication internally (it wouldn't be exposed to the 
users of nepomuk), and it would give you the performance improvement while 
keeping the db clean.

Otherwise I think double typing is the better hack. At least that preserves 
all the information (the nco:ContactMedium). And since I can't think of a 
situation where the 1:n relation from nco:PersonContact to the email address 
wouldn't apply, that should work otherwise. We'd have to relax the cardinality 
of nco:emailAddress to n though (otherwise there would only be one email 
address per personContact, no?)

I think I'd prefer the cache property though (or a better query of course ;-)

Cheers,
Christian