creating a content system

Aaron J. Seigo aseigo at kde.org
Wed Aug 10 18:30:23 CEST 2005


On Wednesday 10 August 2005 02:05, Roberto Cappuccio wrote:
> >  - catalogs don't have individual stop folders (at least not that i can
> > find)
>
> what do you mean with stop folders?

stop folders: URLs not to index.

> > - it searches hidden folders by default
>
> this can be easily made customizable/configurable (we have a KCM module for
> that)

yes, you could but i don't think it's necessary. hidden folders are hidden. 
searching should expose the visible.

now, you mentioned your mail is in a hidden folder. well, so are N other 
things, such as:

	bookmarks
	address books
	sticky notes
	notebooks
	chat logs

etc, etc... thing is, all of these items are best indexed if you understand 
the data format in which they are stored. e.g. how do you relate that this:

[09:46] <aseigo> haha. it's STILL indexing my sent mail.

belongs with this:

BEGIN:VCARD
FN:Aaron Seigo
[lots of personal data deleted for brevity =) ]
END:VCARD

well, you need to know that you have to pull out <aseigo> and that <aseigo> 
actually == Aaron Seigo. we have that bit in the IMProxy, though you have to 
set much of it up manually by associating, aka linking, nicknames to address 
book entries. those linkages ought to be in the linkage store.

moreover, consider when an email is deleted or an address book entry is 
modified. should the indexer re-index that file? consider if that email is in 
an mbox file rather than maildir. this is very, very inefficient compared to 
the application simply saying "ok, this email is now gone."

then there are issues like where the state information isn't in the file at 
all but kept in the in-memory data structures of the app. making the data 
files really just a ghost of the real information. the IMProxy stuff is an 
interesting case study here.

so for application data that is squirreled away, we need application specific 
processing. and we need these applications to define the basic relationship 
patterns. so don't touch hidden folders, let the applications interface with 
an API directly to do this.

> >  - it apparently doesn't take into consideration FD.o conventions such as
> > thumbnail directories (correct me if i'm wrong on that one?)
>
> what we have is the possibility to manually exclude selected directories.
> the next step will be including some of them in the default configuration

it isn't just about excluding thumbnail directories, it's also about using 
them to grab your thumbnails out of. why duplicate the processing time spent 
creating thumbnails when they already exist?

> >  - it only works on local files?
>
> it works on every media you can mount. we would like to extend it to NFS
> and other protocols as well

as zack said, the idea here is to let the indexing happen on the NFS server 
and then bridge between those indexes.

> >  - it relies on a lot of helper apps; i wonder at the overhead of that
>
> when I first started development, I begun importing code from other
> projects like xpdf and antiword in our source tree. The bad things of this
> approach are not immediately evident, but can be expressed as follows:

i'd suggest using poplar, the new xpdf rendering lib. i think most things can 
be dragged in via libraries. no need for source code tree duping or branches. 
for html it will be interesting to look at tapping kdom. now, this won't work 
in every case, but i think it can work a lot more than it currently is. for 
simple formats like RTF, using an external app also seems a bit gratuitous. 

but this isn't a design issue, it's an implementation issue. and i completely 
understand why it is as it is right now: it's pragmatic and quick to get 
going. fortunately, implementation issues are orders of magnitude easier to 
address than design issues ;) and we can work on improving the fulltext index 
plugins over time. i just wouldn't want us to consider them done because they 
happen to work =)

> >  - i'm not sure how things like scheduling work, though i'm of the
> > suspicion it could be better
>
> The actual scheduler sucks :-D
> Our team mate Praveen Kandikuppa is working on its replacement based on
> real load control.
> This is a part of development where we would like to receive help.

ah.. can Praveen start a thread on this list discussing his start on this?

-- 
Aaron J. Seigo
GPG Fingerprint: 8B8B 2209 0C6F 7C47 B1EA  EE75 D6B7 2EB1 A7F1 DB43

Full time KDE developer sponsored by Trolltech (http://www.trolltech.com)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/klink/attachments/20050810/f2f5572c/attachment.pgp


More information about the Klink mailing list