Anything about Tenor?

Tue Aug 9 23:01:58 CEST 2005

> Well, at the moment the problem is that because Kat is more conceptually
> limited in scope the database layout reflects that.  Essentially for
> something like Tenor you need the ability to store and work with graphs
> [1], which is something that it looks like Kat wasn't designed for.
I understand. I have studied the directed graphs as they are used by Google 
for its ranking system. I suppose you will consider a file as a node in the 
graph and keep track of its interactions with other files.
You are right. Kat isn't meant to store data about graphs, but having a common 
layer for data storage would be a good point for both Kat and Tenor.
A common query mechanism could then be built upon that layer so that the users 
would access the search environment through a single interface/GUI.

> To a limited extent that may be problematic in Kat, but Kat seems to have a
> few specific classes of information (thumbnails, fulltext and metadata)
> that it's interested in working with and the database reflects that. 
> Again, Tenor is more open ended, so it needs something more flexible.
> That's not to say that something like that couldn't be built into Kat --
> it's just that if you started reworking Kat to function that way you'd have
> to do almost all of the same steps as building Tenor.  
Yes, what I'm proposing is to enhance/polish Kat instead of rewriting 
everything from scratch. It could become the common layer on which both Kat 
and Tenor could grow up.

> So I think where the storage ends up happening and how the merging might 
look is something that can be worked on later. 
OK.

> Ideally there will only be about 4-5 classes in 
> Tenor that touch the storage layer so those could either be ported to Kat's
> storage mechanism or vice versa.  I am interested in hearing your thoughts
> on some of the more technical aspects of such.
I'm surely interested in hearing your thoughts on that. 
After completing a first iteration, we are now reviewing the architecture in 
order to make it more elastic, for example allowing different types of 
storage (different dbms, file system...).
I admit that the API is not well written (and even less documented), so I 
would like to hear your ideas on it (and possibly use your experience).
This could be the right moment for sitting on a (virtual) table to try to draw 
a sketch for a common architecture.
If we modularize it enough, Kat could become the module for content search and 
tenor could become the module for contextual search.
Other modules, like the one for computational linguistic, could then be added 
in different moments as they become available.
Our search environment would then be really complete.

> From the text extraction and all of that, I haven't gotten really familiar
> with the Kat API for that yet, but I would guess that most of that could be
> shared with minor tweaks.
Yes, I think so. Getting the fulltext or the words list of a file is a matter 
of a call to a function. The API hides the kioslave/kfile plugin mechanism to 
the clients. You receive the text in UTF-8 encoding.

> This is kind of what I meant with the two systems being complimentary --
> most of the graph stuff in Tenor would have to be built the same way with
> or without Kat.
Yes, if you don't use the fulltext, the metadata or the thumbnail of the 
files, you could really ignore Kat.
What is important, IMHO, is to present the users with a single interface/GUI 
for search. And if we are able to store the data (content/context) in the 
same repository, it could be really easy to build such interface.

>
> Cheers,
>
> -Scott

Bye

Roberto

You can find me and the other Kat devs on IRC channel #kat on 
irc.freenode.net. (Please consider the difference in time zone :-) )

>
> [1] http://en.wikipedia.org/wiki/Graph_(data_structure)