[Nepomuk] RFC: The grammar of the new Nepomuk query parser

Vishesh Handa me at vhanda.in
Mon Jun 3 15:46:01 UTC 2013


On Sun, Jun 2, 2013 at 1:18 PM, Denis Steckelmacher <steckdenis at yahoo.fr>wrote:

> After having read the comments here on the mailing list and suggestions
> from Vishesh Handa, I have thought of some modifications of the grammar.
>
> The first goal of the parser is to be the most human-friendly possible.
> That means that the users cannot be forced to learn a complex grammar
> before being able to use the parser. Ideally, the grammar should be able to
> understand natural language, even if its understanding is incomplete or
> inexact.
>
> The second goal is to have a grammar formal enough to be able to implement
> syntax highlighting and auto-completion.
>
> When dealing with natural language, one possible parsing algorithm that
> can be used is simply to dig into the query and to find the most
> information possible. For "gsoc proposal, tagged as Nepomuk", a informal
> parser could recognize the "tagged as X" pattern, then "gsoc proposal" that
> doesn't match anything and is therefore a plain text search.
>
> The problem with this solution is that it is impossible to offer
> auto-completion with it. It is possible to syntax-highlight the input (each
> recognized pattern is highlighted in a different color), but the parser is
> unable to predict any input. It is a completely passive one. Furthermore,
> the parser can easily become a bit hackish.
>
> The new Nepomuk parser needs to meet these two goals. The first one
> requires a simple human-friendly grammar, the second one requires that this
> grammar is formal and can be disambiguated. Currently, my proposed grammar
> is formal, but not user-friendly.
>
> An example of query I would like to be able to parse using my parser is
> "mails sent by Bill last week".
>
> Using an informal parser that digs for information, "sent by X" can be
> recognized, then "last week" that is a date, and possibly also "mails" that
> is recognized as a document type (the query needs to list e-mails).
>
> Using a more formal approach, three things can be considered :
>
> * If the lexer allows property names to contain spaces, it can lex "sent
> by" as a property name (the lexer has a list of valid property names).
> * In my previous proposal, a property name had to be followed by an
> operator (=, >=, <, etc). The operator was in fact used to detect that what
> comes before is a property name. Here, if the lexer has a list of property
> names, it doesn't need operators to detect them. A default operator can
> therefore be used for each property. For "sent by", the default operator is
> "=".
> * "Bill" comes right after a property and its default operator, so it is a
> value. As the lexer doesn't know if the user wanted a value with spaces or
> not, the value of "sent by" is "Bill" and "Bill last week". In the
> ambiguous branch where only "Bill" is the value, "last week" remains to be
> parsed. No property name matches this, so it is parsed as a value, "last
> week", that is detected to be a date-time. The default properties of a
> date-time are "created", "received", etc.
>
> With this simple change of allowing spaces in properties and having a list
> of known properties (with their possible several translations in the user's
> language), the parser is now able to parse queries without any special
> character, more human-friendly.
>
> I think the different operators and the complete grammar need to be kept,
> as only the full grammar is unambiguous. Power users may want to be able to
> use complex and exact queries, and even non-technical users may like to be
> able to use "date<=last week", if someone tell them that it is possible and
> would greatly enhance the results returned by the parser.
>
> What do you think about this update of the proposed grammar ? Do you think
> I am on the right track ?


Yes. This is awesome.

If you are free sometime this week, perhaps we can do a video chat (google
hangouts?) and discuss this some more? I think having an actual voice
conversation might be slightly better than emails.

I'm free most of the week, what time would suit you?

Everyone else is of course welcome to join.


>
> Denis Steckelmacher.
>
>
> ______________________________**_________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/**listinfo/nepomuk<https://mail.kde.org/mailman/listinfo/nepomuk>
>



-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130603/f8b105c6/attachment.html>


More information about the Nepomuk mailing list