creating a content system

Thu Aug 11 15:55:07 CEST 2005

On Wednesday 10 August 2005 03:04, Manuel Amador wrote:
> > moreover, consider when an email is deleted or an address book entry is
> > modified. should the indexer re-index that file? consider if that email
> > is in an mbox file rather than maildir. this is very, very inefficient
> > compared to the application simply saying "ok, this email is now gone."
>
> yes, it is inefficient, but in absence of a standard mechanism to do
> this, it's at least a thing to consider. 

realize that in this term "inefficient" == "users won't use it". they manage 
without such systems today, so if the addition of a search tool makes it 
painful for users they'll play with and treat it as a novelty and then turn 
it off.

this is what happens with every other system i've seen out there. they are not 
efficient and they provide a search tool as the only interface to the 
results. we must avoid that pathway or suffer irrelevance. my goal here is to 
move the power of search and contextualization out of the ivory tower and 
onto mainstream production desktops. apple is trying to do the same thing, 
btw.

> Even if there were a standard 
> mechanism, you won't have much success incorporating this kind of
> mechanism in, say, Pine.  So you still need to provide a fallback method
> for this corner (and lotsa other corner) cases.

to be honest, i don't care about pine, mutt, evolution or thunderbird. i care 
about the apps i can help influence which in this case is kmail.

getting the apps involved is an absolute requirement to getting these things 
to work in a way that average users will care about it and want to use things 
like search and contextualization. the apps must both feed the system _and_ 
be the primary means of search delivery.

users don't want search tools.

> > > >  - it only works on local files?
> > >
> > > it works on every media you can mount. we would like to extend it to
> > > NFS and other protocols as well
> >
> > as zack said, the idea here is to let the indexing happen on the NFS
> > server and then bridge between those indexes.
>
> yeap, otherwise you'd have completely clogged NFS servers wherever your
> indexer is deployed.

and this is why, btw, we need a network-aware storage system. =)

> > i'd suggest using poplar, the new xpdf rendering lib. i think most things
> > can be dragged in via libraries.
>
> nono, please don't drag things as libraries.  This brings dependency
> hell into the mix.  Either do things with popen(2), or use .... damn,
> the name of that thing escapes me... it's a dynamic linker that lets you
> add soft library dependencies to your applications.  Basically you can
> add any dependency you want, and the linker will try to load all
> libraries and tell which ones did not load.  Sorta like a proxy library.

there is no difference in practicality. if there's a dependency on a library 
and you don't have it, the plugin won't install or build. again, we need to 
favour efficiency at every turn here. i can't stress the importance of this 
enough.

> >  no need for source code tree duping or branches.
> > for html it will be interesting to look at tapping kdom. now, this won't
> > work in every case, but i think it can work a lot more than it currently
> > is. for simple formats like RTF, using an external app also seems a bit
> > gratuitous.
>
> Just as a quick tip: these things I solved with regexps.  HTML parsing
> is much faster and accurate that way.  In theory using KDom may seem to
> be the correct route.  In practice, KDom will need to incorporate
> intelligence to parse broken HTML files, a thing that's way simpler to
> do with regular expressions.

if you truly did manage to do a good job of this with regular expressions, 
i'll be impressed. did you simply remove all tags and fulltext the remaining 
items? if so, we're losing a TON of important information (like titles and 
headers =). and i'm not sure what sort of broken HTML KDom would have a 
problem with exactly. you're not doing layout, just looking for user visible 
text and some basic markup hints like headers and whatnot.

-- 
Aaron J. Seigo
GPG Fingerprint: 8B8B 2209 0C6F 7C47 B1EA  EE75 D6B7 2EB1 A7F1 DB43

Full time KDE developer sponsored by Trolltech (http://www.trolltech.com)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/klink/attachments/20050811/c7c0df0f/attachment.pgp