Proposing Tracker for inclusion into GNOME 2.18

Mon Oct 23 21:26:26 BST 2006

Jos van den Oever wrote:
> Hi all,

Hi Jos, great to have you in on this discussion.

> 
> Strigi has a few features that are not in Tracker or Beagle and misses
> a number of features that the other programs lack. But the core
> functionality of Strigi, indexing data, is something that it shares.
> One important distinction has to be made straightaway: the difference
> between indexing metadata and storing metadata. Strigi only indexes
> metadata. If you think you're disk is full, you can just throw away
> the index, because there is no data of value in there. All that's in
> there is an index that allows you to find your data quickly.
> Personally, I think _storing_ metadata in an indexer is not a good
> idea. (I do think that an index on a metadata store is a good idea,
> but that's a different matter). This is a large difference with
> Tracker which does act as a metadata store of 'first class objects'
> whatever that means. Beagle is also mainly an index. (Is any
> non-redundant data lost if I delete my Beagle index, Joe?)

First to clarify, tracker is not a dedicated indexer (like Beagle and 
Strigi) but is first and foremost a database which has indexing as a 
side feature.

Our metadata store (sqlite) is quite separate from our full text indexer 
(QDBM) which can be deleted if not required - the data there is just as 
expendable as in Strigi's and Beagle's case. No metadata is "stored" in 
the full text indexer although indexable metadata is of course indexed 
in it.

Tracker can also be run as a stand alone metadata store/server without 
any indexing if desired (with the --disable-indexing command line option)

[snip]
> 
> So it this is not a sales talk, what is it? It's a call for
> standardization. This discussion between competing programs is a great
> time to start talking about common functionality. With regards to
> desktop search there are many things that can be standardized:
> - query language
> - metadata names and meaning
> - test suites
> - DBus APIs
> - index formats
> 
> I won't discuss index formats because, even though Beagle and Strigi
> both use the Lucene index format, this is an implementation detail and
> defines performance and disk usage and should not be frozen into a
> standard.
> 
> The query language as used by Beagle and Strigi is very similar (no
> coincidence) and is a good start for standardization. The largest
> drawback of the language used is the ambiguity of the field
> specifiers.
> 
> Now that DBus v1 is almost upon is, the barriers between GNOME and KDE
> are diminishing. Functionality defined by a DBus API can by
> implemented in any language and as such, I think GNOME should choose a
> DBus API to use and share with KDE and

yes this is my desire also.

> 
> Test suites. I'd love there to be a common test suite that says: if
> you index this data with these parameters, you should get these
> results from this query. Strigi will develop such test naturally.
> Being able to share them across projects would mean that programs
> would compete on merit and without the usual prejudices and license
> and library incompatibilities.
> Strigi has a DBus interface for searching, so does Tracker. We should
> compare them and find a common interface. Of course the respective
> GNOME and KDE developers should decide which DBus API should be used
> by their applications. Freedesktop.org would be a good place to define
> these interfaces.

we should have a org.freedesktop.indexer interface that we can all 
share. Implementation specific stuff can then reside in their own unique 
interfaces

> 
> Metadata naming and meaning. This is something which is rather hard.
> Dublin Core is part of it. It names some types of metadata. I've
> already mailed about this with Jamie in the past . In my opionion, the
> issue should be separated into smaller definitions that say, what
> metadata fields can be extracted from certain filetypes. Indexer
> plugins could then advertise that they implement this functionality.
> The names for the metadata names should also be used when searching
> and there, for convenience, they should be abbreviated as is current
> practice.
> 
> So, rather a long mail that can be summarized in: please accept an API
> for searching and not a suit of programs (indexer + guis to it) and
> start thinking about standardizing _indexable_ metadata (other
> metadata is a whole different can of worms that I wont touch). This is
> still possible since neither KDE nor GNOME have agreed on a program
> for indexing and by adopting only an API, programs will be forced to
> collaborate to adhere to the API as good as possible, meaning the user
> wins.

I agree from the indexing point of view but Gnome requires a reference 
implementation to be available - in cases where there have been multiple 
cases, Gnome has always blessed one (EG Epiphany vs Galeon) but that 
does not mean distros use the blessed one  (EG Firefox is more likely to 
be used as the dominant web browser even though I think Epiphany is 
better in a Gnome setting)

The other (somewhat unique) features of tracker - desktop wide tagging, 
extensible metadata etc are still vital ingredients for Gnome and thats 
one of the other reasons for proposing tracker and we need it to be more 
integrated if Gnome is to become more integrated in this regard.

We also have big problems with lots of #ifdef'ing in code so 
standardising would be a big win. Im sure when I try and implement 
Epiphany's next generation bookmark/history stuff into tracker's first 
class object database they would prefer it not to be #ifdef'ed?

So with tracker being able to be used as a standalone metadata store 
without any indexing there shouldn't be a need to confine what goes into 
gnome to just pure indexing but could leave the door open to : just 
tracker or tracker+Beagle or tracker+strigi with the latter cases taking 
ownership of the shared indexing dbus interface and tracker confined to 
metadata storage only

Some people might not like that but I think its a practical compromise. 
With tracker being the only one written in pure C it is therefore the 
only one that can *ultimately* get into the Gnome platform and be fully 
integrated (at the moment I am just proposing it for desktop which is 
just a simple blessing nothing more).

I hope having a shared interface for the pure indexing case will solve 
the concerns other indexers have and allow us to integrate tracker 
otherwise we risk restricting innovation and integration with a pure 
indexing solution which would mean we miss out on the more exciting 
features of tracker and their usefulness to Epiphany and other apps.

-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/