Proposing Tracker for inclusion into GNOME 2.18

Mon Oct 23 23:11:13 BST 2006

Just my quick 2 cents: The Metadata standard is one of the things the Nepomuk 
project is aiming for in form of an Ontology. The current state is an 
Ontology called PIMO [1] which is used and tested in the Gnowsis system.

Please join the nepomuk-kde list on semanticdesktop.org [2] if you are 
interested. Not much is going on there yet but it will. ;)

Cheers,
Sebastian

[1] 
http://www.dfki.uni-kl.de/~sauermann/2006/01-pimo-report/pimOntologyLanguageReport.html
[2] https://nepomuk.semanticdesktop.org/wws/info/nepomuk-kde

On Monday 23 October 2006 21:21, Jos van den Oever wrote:
> Hi all,
>
> Today I was demo-ing KDE at the Systems in Munich and the GNOME
> presenter across from me in the GNOME booth told me about this
> discussion [1]. Since I'm the developer of Strigi [2] it interested me
> and I would love to contribute to this discussion. Also, I believe
> this discussion is of interest to the KDE developers, since KDE is
> also in need of good desktop search tools. Therefor, this mail also
> goes to kde-core-devel.
>
> First off, let me say that I'll be going slightly off topic by not
> only discussing inclusion of search engines into GNOME but also
> cooperation between the current alternatives. Both of these aspects
> have been talked about in this thread and I'd like to add to it from
> the point of view of yet another desktop search tool.
>
> But first let me introduce Strigi. Strigi is a desktop search tool
> that has many similarities and difference to Beagle and Tracker and
> which originates from the unfortunate demise of Kat. The goal of
> Strigi is quite clear: index user data so that searching for it is
> fast. The aim is not to index only plain text but also metadata so
> that a user may search for e.g. 'ext:png width:128' to find all files
> with a width of 128 pixels
>
> Strigi has a few features that are not in Tracker or Beagle and misses
> a number of features that the other programs lack. But the core
> functionality of Strigi, indexing data, is something that it shares.
> One important distinction has to be made straightaway: the difference
> between indexing metadata and storing metadata. Strigi only indexes
> metadata. If you think you're disk is full, you can just throw away
> the index, because there is no data of value in there. All that's in
> there is an index that allows you to find your data quickly.
> Personally, I think _storing_ metadata in an indexer is not a good
> idea. (I do think that an index on a metadata store is a good idea,
> but that's a different matter). This is a large difference with
> Tracker which does act as a metadata store of 'first class objects'
> whatever that means. Beagle is also mainly an index. (Is any
> non-redundant data lost if I delete my Beagle index, Joe?)
>
> So if Tracker and Beagle also index data, what's so special about Strigi?
> (sorry for the obligatory boasting coming up)
> - It is KISSest of all
> - It is fastest of all (for indexing many small files, just parsing is
> ~100 docs per second, with writing to the index depends on the index
> backend)
> - It can index files in files in files in files in files
> - It has and indexer that can output XML and can this be used by other
> indexers (Beagle and Tracker) so that indexing code can be shared.
> Having a common metadata standard would be nice for this purpose, but
> see below)
> - It is written in C++
> - It has multiple storage backends clearly separated behind an API so
> that Strigi can always take advantage of the fastest index around
> (currently clucene)
> - It can be used for searching even if there is no index, by using the
> command line programs 'deepfind' and 'deepgrep' [2]
>
> This is however not a sales talk. Strigi stands on it's own. It's GUI
> independent. Currently, it links to clucene or hyperestraier, to
> libexpat and some other common libs like libz and libcrypto. It has a
> DBus interface and can be called from any language with DBus support.
> There's a plugin for GNOME Deskbar in the source code.
>
> So it this is not a sales talk, what is it? It's a call for
> standardization. This discussion between competing programs is a great
> time to start talking about common functionality. With regards to
> desktop search there are many things that can be standardized:
> - query language
> - metadata names and meaning
> - test suites
> - DBus APIs
> - index formats
>
> I won't discuss index formats because, even though Beagle and Strigi
> both use the Lucene index format, this is an implementation detail and
> defines performance and disk usage and should not be frozen into a
> standard.
>
> The query language as used by Beagle and Strigi is very similar (no
> coincidence) and is a good start for standardization. The largest
> drawback of the language used is the ambiguity of the field
> specifiers.
>
> Now that DBus v1 is almost upon is, the barriers between GNOME and KDE
> are diminishing. Functionality defined by a DBus API can by
> implemented in any language and as such, I think GNOME should choose a
> DBus API to use and share with KDE and
>
> Test suites. I'd love there to be a common test suite that says: if
> you index this data with these parameters, you should get these
> results from this query. Strigi will develop such test naturally.
> Being able to share them across projects would mean that programs
> would compete on merit and without the usual prejudices and license
> and library incompatibilities.
> Strigi has a DBus interface for searching, so does Tracker. We should
> compare them and find a common interface. Of course the respective
> GNOME and KDE developers should decide which DBus API should be used
> by their applications. Freedesktop.org would be a good place to define
> these interfaces.
>
> Metadata naming and meaning. This is something which is rather hard.
> Dublin Core is part of it. It names some types of metadata. I've
> already mailed about this with Jamie in the past . In my opionion, the
> issue should be separated into smaller definitions that say, what
> metadata fields can be extracted from certain filetypes. Indexer
> plugins could then advertise that they implement this functionality.
> The names for the metadata names should also be used when searching
> and there, for convenience, they should be abbreviated as is current
> practice.
>
> So, rather a long mail that can be summarized in: please accept an API
> for searching and not a suit of programs (indexer + guis to it) and
> start thinking about standardizing _indexable_ metadata (other
> metadata is a whole different can of worms that I wont touch). This is
> still possible since neither KDE nor GNOME have agreed on a program
> for indexing and by adopting only an API, programs will be forced to
> collaborate to adhere to the API as good as possible, meaning the user
> wins.
>
> Cheers,
> Jos
>
> [1]
> http://mail.gnome.org/archives/desktop-devel-list/2006-October/msg00175.htm
>l [2] http://www.vandenoever.info/software/strigi/
> [3] http://www.kdedevelopers.org/node/2468