creating a content system

Wed Aug 10 10:05:12 CEST 2005

Alle 02:16, mercoledì 10 agosto 2005, Aaron J. Seigo ha scritto:
> hi..
>
> so we have kat which has lots of code.
> we have tenor which has lots of design.
> what we need is a content system; something that can provide a back end to
> things like:
>
>  kfind
>  a content manager app for kde4
>  media applications (think of all the context stuff in amarok)
>  content-centric applications (kpdf, kword, etc)
agreed

>
> there are four layers to be considered, from bottom to top:
>
> 0. storage
> 1. API
> 2. population
> 3. user interface
OK

>
> i think where kat shines right now is that it is addressing #2. the version
> in svn right now is a lot better it seems than the previous released
> versions i've tried. that's good. i've said right from the start that #2 is
> a valid project in and of itself, really, and something that all of these
> types of systems need. i'd like to see us collaborate start with the
> population mechanism.
>
> there are many problems with the current population mechanism in kat. these
> include this like:
>
>  - catalogs don't have individual stop folders (at least not that i can
> find) 
what do you mean with stop folders?

> - it searches hidden folders by default 
this can be easily made customizable/configurable (we have a KCM module for 
that)

>  - it apparently doesn't take into consideration FD.o conventions such as
> thumbnail directories (correct me if i'm wrong on that one?)
what we have is the possibility to manually exclude selected directories. the 
next step will be including some of them in the default configuration

>  - it only works on local files?
it works on every media you can mount. we would like to extend it to NFS and 
other protocols as well

>  - it relies on a lot of helper apps; i wonder at the overhead of that
when I first started development, I begun importing code from other projects 
like xpdf and antiword in our source tree. The bad things of this approach 
are not immediately evident, but can be expressed as follows:
1. you have to create a branch for every helper you decide to import, in order 
to have the possibility to adapt the code to Kat's needs. Every time the 
original code changes, you will have to port the changes to your branch.
2. a lot of helpers are written in languages different from C++, making the 
management of them much more complicated

I'm really interested in hearing your proposals for this problem.

>  - i'm not sure how things like scheduling work, though i'm of the
> suspicion it could be better
The actual scheduler sucks :-D
Our team mate Praveen Kandikuppa is working on its replacement based on real 
load control.
This is a part of development where we would like to receive help.

>  - which leads me to: it needs documentation. i will not support such a
> complex system that does not have extensive documentation for its design.
I will publish an extensive description of the architecture on our WIKI in the 
next days.

> API docu is not enough, though it is VERY nice to see extensive API docu
> available.
Yes, I asked Laurent Montel to enable Apidox in our source tree. I will begin 
to document the API as soon as possible.

> it doesn't take over the CPU quite like previous versions did for me,
> however, and that's a nice stride forward.
OK

>
> this leaves us with the other pieces:
>
> 0. storage
>  sqlite is not a good solution here, IMHO, because:
>   - it's too slow for doing anything resembling an interesting query
>   - it's not network aware
sqlite was the simplest solution to start with. We didn't want to oblige the 
users to install a fully fledged DBMS just for Kat, but some months of 
experience have teached us that sqlite could be really insufficient.

We also tried to use the QtSQL library but it seems not to be mature enough to 
support what we need: blobs, UTF-8, queries with placeholders and the like.

Aaron, now that you are near to Trolltech, could you please ask them to speed 
up the development of QtSQL? Even the version which is shipped with Qt4 
sucks. I would really e glad to use that library because then we could use 
whatever DBMS we want to use (PostgreSQL and MySQL only to name a few).

>  the schema should be context centric, not content centric
>   - my original schema proposal, which seems to have been swept aside in
> the last tenor update into playground divided the database into two sets: -
> contextual linkage
>    - content indexing
>      i think we'd be best served by having each of these separate since
> each require slightly different semantics when it comes to processing and
> subsequent searching
OK

>
> 1. API
>
> the Kat API needs work. the Tenor API as it was shaping up was really far
> more interesting. searching is far, far more than "look for this blob of
> text" and API design is a bit of art. Scott is quite good at this (cf
> taglib). 
I'm ready to admit that I'm not so good at libraries development. I have no 
specific experience of that, so I would welcome Scott's help with it.

> with a dual context/content storage facility, it should be quite 
> possible to design a search and navigation api that maps both to kat's idea
> of searching and tenor's idea of contextualization. i'd take Kat as a
> prototype consumer of search and create the Tenor API in a manner which
> services it.
OK

>
> this means the Kat author(s) need to clearly state their goals for search.
As said before, I will publish a comprehensive architectural schema of Kat on 
the WIKI in the next days, maybe today.

> i've seen various terms bandied about, e.g. computational linguistics,
> which need to be well scoped for this part of the project.
OK

>
> 3. user interface
>
> this can wait. 0-2 need to be done first.
agreed

My proposal is to create a set of pages on Kat's WIKI, where we all can 
contribute to the definition of the architecture and scopes of both Kat and 
Tenor. You can register there and start putting ideas, scketches and mockups 
in it. http://infserver.unibz.it/kat/wiki

I'll prepare the pages in a minute or so. I'm also in IRC right now.

Bye

Roberto