Jos van den Oever
jos at vandenoever.info
Mon Jan 12 21:50:42 CET 2009
Type: KDE Improvement
Strigi Desktop Search
Here are the main features of Strigi:
very fast crawling
very small memory footprint
no hammering of the system
pluggable backend, currently clucene and
hyperestraier, sqlite3 and xapian are in the works
communication between daemon and search program
over an abstract interface with two
implementations: DBus and a simple unix socket.
Especially the DBus interface makes it very easy
to write client applications. There are a few
sample scripts in the code using Perl, Python, GTK
and Qt. Writing clients is so easy that any Gnome
or KDE app could implement this.
Aditionally, there is a simple interface for
implementing plugins for extracting information.
We'll try to reuse the kat plugins, although
native plugins will have a large speed advantage.
Strigi also has calculation of sha1 for every file
crawled which allows for fast finding of duplicate
- Move Strigi::DirLister in archivereader.h to
ArchiveReader::DirLister. Two class with this name
were present in the code. The one in
archivereader.h was not used in any code outside
of Strigi, so we are changing it. Note that this
changes means that one should not use Strigi
- Change type of EntryInfo.mtime from 'unsigned'
- The spec of SDF files was found and used to
implement a more precise syntax check for the
header of SDF files.
- Fix memory corruption bug in ArchiveReader.
- Change type of ontology entry 'exposureTime' to
string. In theory something like duration would
make sense but in practice xsd:string is the used
- Add a default rule to find mail box directories
with pattern '.*.directory'. Since these directory
names start with a dot, they are normally not
- Add '$HOME/.kde4' to the directories that are
indexed by default.
- Simplify matching of file paths in the rules
for including or excluding directories from the
index. The code is now more readable and easier to
- Fix a big performance problem: Whenever a
directory mtime changed, all files inside the
directory were re-indexed.
- Fix bug where a gz archive that contains a file
that is identical to the
original archive is indexed over and over. The
depth of nested files that are indexed is now
limited to 127.
More information about the Kde-announce-apps