Fulltext search infrastructure

Fred Schaettgen kde.sch at ttgen.net
Tue Mar 29 01:52:56 CEST 2005


Hi,

Last week I talked to Roberto Cappuccio, who started writing a search tool 
called KAT (http://kat.sf.net). It's not quite ready, but at least it 
promises to fulfill a rather simple wish I had since a long time - being able 
to do fulltext searches over various file formats, including pdf and doc or 
sxw. 

To be honest, I still don't know what exactly is in the scope of klink and 
what not, but my guess is that extracting text information from different 
file formats will at least be a tiny part of the whole thing. 
So if it is, then I would like to suggest to agree on a common interface to 
extract - possibly lengthy - fulltext data from documents. Writing a good 
search engine is hard, but having to maintain various plugins for all kinds 
of formats doesn't make things easier. 

In another post in this list someone - Scott IIRC - suggested to extend the 
kfile plugins to return fulltext data just like other metadata. Another 
option would be to introduce a new fulltext kioslave, which uses it's own 
plugins to extract the data from the files. There are pros and cons for each 
approach, but because it was easier to create a new plugin type (I can use my 
solid non-CVS KDE then), I chose the second alternative.
My idea was to let the kioslave emit an xml file with the fulltext data, with 
additional markup for structural entities like pages, lines, timestamps, 
whatever. With such an interface it could be easily used but other programs, 
including non-indexing search tools like kfind, other 3rd-party tools or 
maybe even for text-to-speech applications.
Could this be of any use for klink, too, or is it completely off-topic?

regards
Fred

Btw. please send me a CC, I'm not subscribed.

-- 
Fred Schaettgen
kde.sch at ttgen.net


More information about the Klink mailing list