Anything about Tenor?

Wed Aug 10 02:28:54 CEST 2005

On Tuesday 09 August 2005 23:01, Roberto Cappuccio wrote:
> > Well, at the moment the problem is that because Kat is more conceptually
> > limited in scope the database layout reflects that.  Essentially for
> > something like Tenor you need the ability to store and work with graphs
> > [1], which is something that it looks like Kat wasn't designed for.
>
> I understand. I have studied the directed graphs as they are used by Google
> for its ranking system. I suppose you will consider a file as a node in the
> graph and keep track of its interactions with other files.
> You are right. Kat isn't meant to store data about graphs, but having a
> common layer for data storage would be a good point for both Kat and Tenor.
> A common query mechanism could then be built upon that layer so that the
> users would access the search environment through a single interface/GUI.

> Yes, what I'm proposing is to enhance/polish Kat instead of rewriting
> everything from scratch. It could become the common layer on which both Kat
> and Tenor could grow up.

I think there might be a little confusion on what exactly Tenor aims to be -- 
so a little background might help.

"Tenor", or at least as I see it, mostly is the lowest level.  i.e. it's the 
graph and an API to work with the graph.

Originally we assumed that there would need to be some helper applications to 
go along with that, like a daemon for watching file changes and such, but 
there's no reason to duplicate that from Kat.  The main thing that we'd need 
in addition to what Kat provides is a "dispatcher" in the daemon so that data 
can be interpreted by the application itself (because we'll often be working 
with information that isn't a "file", but a more abstract "resource" -- i.e. 
a "To:" field in a mail that we'd need KMail to interpret).  But that's 
pretty simple actually.

We'd need for that to go both directions.  The structure that I've thought of 
for that is a virtual method in KApplication that's something like:

virtual void handleLink(const Tenor::Link &link) {}

There would then be a DCOP/DBUS listener in the default KApplication DCOP/DBUS 
interface that would receive an incoming call from the daemon and call the 
above function.  Applications could then provide a means of interpreting 
those incoming calls.

And then one to activate links:

void activateLink(const Tenor::Link &link);

That would translate the API call into a DCOP/DBUS message to go to the daemon 
that would then go back out via the dispatcher.

The daemon itself would just need to have the logic to be able to tell where 
the link should go from reading the link properties.

Going back a step -- when I said that I think Kat could sit on top of Tenor at 
some point that really shouldn't mean too much for Kat.  If you're already 
working on making the storage layer fairly abstract, providing a Tenor 
backend shouldn't be a problem.

That's kind of what I meant by the efforts are mostly complimentary; it seems 
that the strengths of Kat at this point are that it has a daemon, has some 
text extraction stuff and is starting to work on some interface ideas.  None 
of those are at the core of what I call "Tenor".  Originally we would have 
had to write those things ourselves, but there's no reason not to share those 
with Kat.

> I'm surely interested in hearing your thoughts on that.
> After completing a first iteration, we are now reviewing the architecture
> in order to make it more elastic, for example allowing different types of
> storage (different dbms, file system...).
> I admit that the API is not well written (and even less documented), so I
> would like to hear your ideas on it (and possibly use your experience).
> This could be the right moment for sitting on a (virtual) table to try to
> draw a sketch for a common architecture.
> If we modularize it enough, Kat could become the module for content search
> and tenor could become the module for contextual search.
> Other modules, like the one for computational linguistic, could then be
> added in different moments as they become available.
> Our search environment would then be really complete.

I think the main overlaps in the API would be along the lines of the basic 
search stuff.

There are a few things that I see quickly looking at the Kat search API that 
are weaknesses.

For instance the "search" returns a list.  That assumes that you'll be able to 
quickly do an exhaustive search, which won't be the case many times.  Since 
most database APIs work with something that's similar to iterators that's a 
much better way of approaching such a thing.

There also would need to be a more modular search criteria -- something that 
would be extensible beyond text searches.

Just doing an API sketch here -- I'd thought of something like this:

namespace Tenor {

class Search
{
public:

  class Criteria
  {
  public:
    virtual float matchStrength(const Link &link) const = 0;  
  };

  class Iterator
  {
  public:
    Iterator &operator++();
    const Link &operator*() const;
    const operator==(const Iterator &other) const;
  };

  Search(const Criteria &criteria);
  Iterator begin() const;
  Iterator end() const;
};

}

And then a concrete case for text search:

class WordCriteria : public Search::Criteria
{
public:
  WordCriteria(const QStringList &words);
  virtual float matchStrength(const Link &link) const;
}

Where matchStrengh() would do some evaluation of how well the structure 
matched to the words provided.

So you could do a search with something like:

Search s(WordCriteria("foo"));
for(Search::Iterator it = s.begin(); it != s.end(); ++it) {
  // do stuff with the matching items
}

That all could be mapped back to Kat's API by just generating a list of 
results, but I'd expect that an iterator based API would be useful for Kat as 
well.

For now I wouldn't worry about unifying the ideas completely -- it's probably 
just best to bounce ideas of one another and keep the general directions 
compatible.

> Yes, I think so. Getting the fulltext or the words list of a file is a
> matter of a call to a function. The API hides the kioslave/kfile plugin
> mechanism to the clients. You receive the text in UTF-8 encoding.

> Yes, if you don't use the fulltext, the metadata or the thumbnail of the
> files, you could really ignore Kat.
> What is important, IMHO, is to present the users with a single
> interface/GUI for search. And if we are able to store the data
> (content/context) in the same repository, it could be really easy to build
> such interface.

Well, it's not that they're not used, just in Tenor they're just a type of 
data.  And there's no reason not to use the things that Kat provides for 
extracting those.

> You can find me and the other Kat devs on IRC channel #kat on
> irc.freenode.net. (Please consider the difference in time zone :-) )

I think we're in the same time zone.  :-)

One thing to keep kick around a little bit (though this is probably good for 
another thread -- too many thoughts in this one might get confusing) would be 
to decide what's "core" to Kat and Tenor.  I think the "core" ideas of each 
don't overlap very much and that might give us an idea of where we might need 
to unify things a bit vs. where development is largely independent.

Cheers,

-Scott

-- 
Three words: you have no clue
-Slashdot