[Nepomuk] Re: Presentation, queries and other stuff

Tue Mar 1 04:20:25 CET 2011

Hi

> Yes but aide system is missing, I filled a bug about this in
> > kde-apps.org <http://kde-apps.org>, when I write a new text queries
> > don't work at all and dolphin do a file name search and seems like
> > logical "and" is totally broken. This behavior is not really useful to
> > me :).
>
> kde-apps.org is not really a place to file bugs. They will never get
> noticed. Please use bugs.kde.org.
>

Oh, sorry, I done a mistake when I wrote this. I naturally filled the bug in
bugs.kde.org :).

https://bugs.kde.org/show_bug.cgi?id=266307

> As for the Dolphin search: I am not happy with it either. It will
> hopefully be better in 4.7.
>

Fortunately this is an easy task, any change in the current system will be
and improvement.

In my case, I'm totally happy with KDE 4.5 behavior, a simple but powerful
input box with auto completion (or aide system). It was marvelous when you
pressed "a" and you could saw all things that begins with "a". It's a
powerful system that don't bothers people that don't understand it and could
be disabled with a simple configuration check if really bothers someone.

> In theory it already is a bit multi-language. We only need translations
> for the ontologies. Sadly every attempt at providing those translations
> failed for different reasons.
>

I hope you are referring to the parser and not the real ontology in the DB
:). Seems an easy task because you only need to translate a few words so,
obviously, if your past attempts failed I'm missing something.

> About aide system I'm really interested in implement something similar
> > so, where I can found this code?. I barely read C/C++ but I will do my
> best.
>
> What do you mean by "aide system" anyway? Auto-completion?
>

Yes I mean auto completion, sorry for the confusion. This aide was marvelous
and KDE 4.5 was a quantum leap to Nepomuk and the main reason I begin to use
in daily basics. In other KDE versions I tried very hard but I finally was
frustrated so I deactivated it. For me, a database without queries is
garbage. Well, the external disk management is other big problem but I wrote
my own solution so actually don't bothers me so much and I assumed that I
will lost all my data time to time.

> At least I have one thing clear, an alternative search system written in
> > python ready to use when official query systems are broken could be a
> > great addition.
>
> Why do you generate SPARQL directly instead of using the query API[1]?
> That would be much simpler for you and make for better readable code.
>

http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/namespaceNepomuk_1_1Query.html

Well this in an interesting question. In my experience nepomuk query system
is not reliable and, when I finally understand some of this stuff and do my
first tests I found insuperable barriers. I think that I tried this, or
something similar, example:

Nepomuk::Query::LiteralTerm
<http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Query_1_1LiteralTerm.html>
nepomukTerm("nepomuk");Nepomuk::Query::Query
<http://api.kde.org/4.x-api/kdelibs-apidocs/nepomuk/html/classNepomuk_1_1Query_1_1Query.html>
query( nepomukTerm );

and I spend several hours reading until I found that I need to call to
Soprano.LiteralValue(). Actually is pretty clear to me what was the problem
but, a couple of weeks ago was a little bit annoying :).

When I finally have a working version I tried the next test:

term = Nepomuk.Query.LiteralTerm(Soprano.LiteralValue("música"))
query = Nepomuk.Query.Query(term)
model = Nepomuk.ResourceManager.instance().mainModel()
data = model.executeQuery(sparqlQuery, Soprano.Query.QueryLanguageSparql)

et voilà, I have no results for "música". Code was working with ascii
strings but with non ascii string didn't works. On the other side, using the
API I only saw one column, the URI and this was the end.

But today, because your mail and because I learn a lot in last weeks, I
tried API again and I found that both problems could be solved easily:

term = Nepomuk.Query.LiteralTerm(Soprano.LiteralValue(unicode("música",
'UTF-8')))
query = Nepomuk.Query.Query(term)
sparqlQuery = query.toSparqlQuery().replace('distinct ?r', 'distinct ?r
?r*>nie:url AS ?url ?r*>nao:prefLabel AS ?prefLabel')

the problem with unicode was solved converting strings using unicode()
function and the columns problem was solved with a simple replace, in
previous example I added columns ?url and ?prefLabel. I'm not sure if I
missed something an there is other method to do this stuff.

So, with a new API available I do a couple of test with a simple search to
compare and here is the results (time = creation + execution +
visualization):

search: "película"
105 results found in 9.35717892647 seconds (API with columns ?$r ?r*>nie:url
AS ?url)
105 results found in 8.65342307091 seconds (API with columns ?r)
108 results found in 1.45403599739 seconds (nsSparqlBuilder with colums ?r
?url ?prefLabel ?title)

Both results are accurate because I search in the url field and I have three
entries with the word "película". There is 108 folders in total.

search: "música"
896 results found in 8.5596601963 seconds (API with columns $r ?r*>nie:url
AS ?url ?r*>nao:prefLabel AS ?prefLabel)
896 results found in 9.21140098572 seconds (API with column ?r)
868 results found in 1.5461499691 seconds (nsSparqlBuilder with colums ?r
?url ?prefLabel ?title)

Both results are accurate because I have 29 tags related to "música"
(nsSparqlBuilder don't shows this entries) and one folder with the word
música (API don't shows this entries). There is 867 files and 30 tags
related in total.

An important fact is that first API query, the one with more columns is
more, fast that the second one and this was probably because virtuoso
query optimizers are working better with the first query. I'm only
speculation of course. For sure I repeat both queries several times :).

As you could see time difference is high, and I spend some milliseconds
testing if file exists, but I need more than two test to obtain valid
conclusions. Obviously you are trying to extract more data than me and this
always has a penalty cost. On the other side I don't tried FileQuery class
yet and probably must be fast.

I'm planning to improve my builder next weekend but I changed my mind and I
will add support to this API because is an easy task and it will be terrific
method to build test cases.

Thank you for your support.

-- 
Cheers,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20110301/48d1d314/attachment.htm