creating a content system

Wed Aug 10 02:16:58 CEST 2005

hi..

so we have kat which has lots of code.
we have tenor which has lots of design.
what we need is a content system; something that can provide a back end to 
things like:

	kfind
	a content manager app for kde4
	media applications (think of all the context stuff in amarok)
	content-centric applications (kpdf, kword, etc)

there are four layers to be considered, from bottom to top:

0. storage
1. API
2. population
3. user interface

i think where kat shines right now is that it is addressing #2. the version in 
svn right now is a lot better it seems than the previous released versions 
i've tried. that's good. i've said right from the start that #2 is a valid 
project in and of itself, really, and something that all of these types of 
systems need. i'd like to see us collaborate start with the population 
mechanism.

there are many problems with the current population mechanism in kat. these 
include this like:

	- catalogs don't have individual stop folders (at least not that i can find)
	- it searches hidden folders by default
	- it apparently doesn't take into consideration FD.o conventions such as 
thumbnail directories (correct me if i'm wrong on that one?)
	- it only works on local files?
	- it relies on a lot of helper apps; i wonder at the overhead of that
	- i'm not sure how things like scheduling work, though i'm of the suspicion 
it could be better
	- which leads me to: it needs documentation. i will not support such a 
complex system that does not have extensive documentation for its design. API 
docu is not enough, though it is VERY nice to see extensive API docu 
available.

it doesn't take over the CPU quite like previous versions did for me, however, 
and that's a nice stride forward.

this leaves us with the other pieces:

0. storage
	sqlite is not a good solution here, IMHO, because:
		- it's too slow for doing anything resembling an interesting query
		- it's not network aware
	the schema should be context centric, not content centric
		- my original schema proposal, which seems to have been swept aside in the 
last tenor update into playground divided the database into two sets:
			- contextual linkage
			- content indexing
		   i think we'd be best served by having each of these separate since each 
require slightly different semantics when it comes to processing and 
subsequent searching

1. API

the Kat API needs work. the Tenor API as it was shaping up was really far more 
interesting. searching is far, far more than "look for this blob of text" and 
API design is a bit of art. Scott is quite good at this (cf taglib). with a 
dual context/content storage facility, it should be quite possible to design 
a search and navigation api that maps both to kat's idea of searching and 
tenor's idea of contextualization. i'd take Kat as a prototype consumer of 
search and create the Tenor API in a manner which services it.

this means the Kat author(s) need to clearly state their goals for search. 
i've seen various terms bandied about, e.g. computational linguistics, which 
need to be well scoped for this part of the project.

3. user interface

this can wait. 0-2 need to be done first.

-- 
Aaron J. Seigo
GPG Fingerprint: 8B8B 2209 0C6F 7C47 B1EA  EE75 D6B7 2EB1 A7F1 DB43

Full time KDE developer sponsored by Trolltech (http://www.trolltech.com)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/klink/attachments/20050810/558f98db/attachment.pgp