[Nepomuk] About queries, as usual

Ignacio Serantes kde at aynoa.net
Sun Mar 4 13:20:24 UTC 2012


Hi,

For me having two query engines are fantastic because there is a method to
easily compare results and performance and, sometimes, discover that
different approach offers totally different results and both could be right
results. Please note that, despite I'm using Nepoogle, I can call
Query:QueryParser() so results are the same results obtained using the same
query in KRunner, or Dolphin if it's working in your system.

As an example I will try to locate all videoclips from Miryo so my first
thought is:

*miryo and videoclip*

This is the Nepoogle's output using own SPARQL engine:

*ignacio at misaki:~> nepoogle miryo and videoclip 2>/dev/null

                                 *
*Querying Nepomuk

                             *
*

                              *
*.../Miryo/[MV] Miryo - 사랑해 사랑해 (Feat 써니_소녀시대) (2012).mkv, [MV] Miryo - 사랑해
사랑해 (Feat 써니_소녀시대) (2012), Video*
*.../Miryo/[MV] Miryo - 사랑해 사랑해 (Feat 써니_소녀시대) (2012).webm, Video
                                            *
*.../Miryo/[MV] Miryo - Dirty (2012).mkv, [MV] Miryo - Dirty (2012), Video
                                                         *
*.../Miryo/[MV] Oh Won Bin - I Love You And I Love You (Ft. Miryo).mkv,
[MV] Oh Won Bin - I Love You And I Love You (Ft. Miryo), Video


             *
*.../Brown Eyed Girls/Sign/[DVDRip] BEG - Sign (Japanese
version)(2011).mkv, Video
                *
*
*
*5 records found in 1.74870721 seconds.*
*--*
*Powered by nepoogle v0.9git (2012-xx-xx)*

Note: I remove the full path to files and add "..." manually.

As you could see Nepoogle found 5 results and this is right. Well I cheat a
little bit because before doing this query I knew the solution ;).

So for fun let's do the same query using Query::QueryParser(). This is easy
because I only need to add in Nepoogle e0 prefix to the query:

*ignacio at misaki:~> nepoogle e0 miryo and videoclip 2>/dev/null*
*Querying Nepomuk*
*
*
*
*
*0 records found in 0.022747 seconds.*
*--*
*Powered by nepoogle v0.9git (2012-xx-xx)*

What?, 0 records? How Query::QueryParser() it's failing to locate files
with and so easy query? This is a bug.

*But not*, there is not a bug at all :), there is only a natural behavior
related to the logic used to build the queries and the explanation is
really simple.

If RT1 is a result set obtained by filtering using term1 and RT2 if a
result set obtained filtering using term 2, Nepoogle are doing a set union of
both results to obtain the result set. In set theory the operation is: RT1
U RT2,

On the other side, if RT is a result set obtained filtering using term 1
and term2 Query::QueryParser() results are obtained filtering in only one
set for both terms to obtain the result set. In set theory the operation
is: RT.

About our 5 results, Nepoogle is locating results because it's searching in
two different sets, in this case tags "miryo" and "videoclip", and
Query::QueryParser() is not locating anything because there is no set where
"miryo" and "videoclip" are together. Don't be fooled by the fact both
cases are tags, at relational algebra level there are different sets.

Of course, this is happening only with raw text search because if I using
the query *hastag:miryo and hastag:videoclip* with Query::QueryParser(), I
obtained the 5 expected results :).

Nepoogle has the contrary problem. If I'm looking only for the words
"miryo" and "videoclip" in the content of a text file Query::QueryParser()
will return right results but Nepoogle will return also results with only
one of the words, if the other word is in other set that is not necessarily
the content of the text file.


In brief, and the main reason to write this mail, don't expect that
Nepoogle and Nepomuk API offers always the same result sets and, when this
happens this not necessarily be a bug. Think that this behavior happens
every with web search engines :).

-- 
Best wishes,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120304/ad83b7d6/attachment.html>


More information about the Nepomuk mailing list