Proposing Tracker for inclusion into GNOME 2.18

Mon Oct 23 20:21:01 BST 2006

Hi all,

Today I was demo-ing KDE at the Systems in Munich and the GNOME
presenter across from me in the GNOME booth told me about this
discussion [1]. Since I'm the developer of Strigi [2] it interested me
and I would love to contribute to this discussion. Also, I believe
this discussion is of interest to the KDE developers, since KDE is
also in need of good desktop search tools. Therefor, this mail also
goes to kde-core-devel.

First off, let me say that I'll be going slightly off topic by not
only discussing inclusion of search engines into GNOME but also
cooperation between the current alternatives. Both of these aspects
have been talked about in this thread and I'd like to add to it from
the point of view of yet another desktop search tool.

But first let me introduce Strigi. Strigi is a desktop search tool
that has many similarities and difference to Beagle and Tracker and
which originates from the unfortunate demise of Kat. The goal of
Strigi is quite clear: index user data so that searching for it is
fast. The aim is not to index only plain text but also metadata so
that a user may search for e.g. 'ext:png width:128' to find all files
with a width of 128 pixels

Strigi has a few features that are not in Tracker or Beagle and misses
a number of features that the other programs lack. But the core
functionality of Strigi, indexing data, is something that it shares.
One important distinction has to be made straightaway: the difference
between indexing metadata and storing metadata. Strigi only indexes
metadata. If you think you're disk is full, you can just throw away
the index, because there is no data of value in there. All that's in
there is an index that allows you to find your data quickly.
Personally, I think _storing_ metadata in an indexer is not a good
idea. (I do think that an index on a metadata store is a good idea,
but that's a different matter). This is a large difference with
Tracker which does act as a metadata store of 'first class objects'
whatever that means. Beagle is also mainly an index. (Is any
non-redundant data lost if I delete my Beagle index, Joe?)

So if Tracker and Beagle also index data, what's so special about Strigi?
(sorry for the obligatory boasting coming up)
- It is KISSest of all
- It is fastest of all (for indexing many small files, just parsing is
~100 docs per second, with writing to the index depends on the index
backend)
- It can index files in files in files in files in files
- It has and indexer that can output XML and can this be used by other
indexers (Beagle and Tracker) so that indexing code can be shared.
Having a common metadata standard would be nice for this purpose, but
see below)
- It is written in C++
- It has multiple storage backends clearly separated behind an API so
that Strigi can always take advantage of the fastest index around
(currently clucene)
- It can be used for searching even if there is no index, by using the
command line programs 'deepfind' and 'deepgrep' [2]

This is however not a sales talk. Strigi stands on it's own. It's GUI
independent. Currently, it links to clucene or hyperestraier, to
libexpat and some other common libs like libz and libcrypto. It has a
DBus interface and can be called from any language with DBus support.
There's a plugin for GNOME Deskbar in the source code.

So it this is not a sales talk, what is it? It's a call for
standardization. This discussion between competing programs is a great
time to start talking about common functionality. With regards to
desktop search there are many things that can be standardized:
- query language
- metadata names and meaning
- test suites
- DBus APIs
- index formats

I won't discuss index formats because, even though Beagle and Strigi
both use the Lucene index format, this is an implementation detail and
defines performance and disk usage and should not be frozen into a
standard.

The query language as used by Beagle and Strigi is very similar (no
coincidence) and is a good start for standardization. The largest
drawback of the language used is the ambiguity of the field
specifiers.

Now that DBus v1 is almost upon is, the barriers between GNOME and KDE
are diminishing. Functionality defined by a DBus API can by
implemented in any language and as such, I think GNOME should choose a
DBus API to use and share with KDE and

Test suites. I'd love there to be a common test suite that says: if
you index this data with these parameters, you should get these
results from this query. Strigi will develop such test naturally.
Being able to share them across projects would mean that programs
would compete on merit and without the usual prejudices and license
and library incompatibilities.
Strigi has a DBus interface for searching, so does Tracker. We should
compare them and find a common interface. Of course the respective
GNOME and KDE developers should decide which DBus API should be used
by their applications. Freedesktop.org would be a good place to define
these interfaces.

Metadata naming and meaning. This is something which is rather hard.
Dublin Core is part of it. It names some types of metadata. I've
already mailed about this with Jamie in the past . In my opionion, the
issue should be separated into smaller definitions that say, what
metadata fields can be extracted from certain filetypes. Indexer
plugins could then advertise that they implement this functionality.
The names for the metadata names should also be used when searching
and there, for convenience, they should be abbreviated as is current
practice.

So, rather a long mail that can be summarized in: please accept an API
for searching and not a suit of programs (indexer + guis to it) and
start thinking about standardizing _indexable_ metadata (other
metadata is a whole different can of worms that I wont touch). This is
still possible since neither KDE nor GNOME have agreed on a program
for indexing and by adopting only an API, programs will be forced to
collaborate to adhere to the API as good as possible, meaning the user
wins.

Cheers,
Jos

[1] http://mail.gnome.org/archives/desktop-devel-list/2006-October/msg00175.html
[2] http://www.vandenoever.info/software/strigi/
[3] http://www.kdedevelopers.org/node/2468