Scrap baloo?

Christoph Cullmann cullmann at absint.com
Wed Sep 28 21:38:08 UTC 2016


Hi,

first of all: I appreciate all your work and don't want to attack you personally in any way,
if my last mail felt that way, I am sorry!

> Hi,
> 
> On 28 September 2016 at 20:33, Christoph Cullmann <cullmann at absint.com> wrote:
>> Hi,
>>
>>> Hi,
>>>
>>> On 28 September 2016 at 02:36, Christoph Cullmann <cullmann at absint.com> wrote:
>>>> any update?
>>>
>>> Yep. In all the happennings of the week I just forgot to write this email.
>>>
>>> If Baloo is going to be an integral part of the Plasma experience, do
>>> we really want to depend on an external project where we don't have
>>> control (and indeed, sentiments may prevent unrestricted contributions
>>> based only on merit). This is the political reason why I don't want to
>>> depend against Tracker. The technical reason is that it's based on
>>> SQLite, which is incredibly slow compared to what we do now.
>> I don't see really that it is slow compared to what we do, if you have
>> benchmarks
>> for that, I would be pleased to see them.
> 
> So would I. You already have Tracker based code, could you spare some
> time and run some?
Not really, I would not even known what to benchmark.
But if you have no benchmarks available, I am interested to know why there is that
idea we are that much faster? LMDB is fast, as key value storage, but we do
not just lookup a key but do a lot more on top, that means only because we use a
faster DB, we don't need to end up with faster overall performance than other solutions.

> 
>>> At the same time, LMDB needs to be replaced, and fast. I'm building a
>>> new KVDB as an university project (it should be able to do 256GB
>>> indexes on 32bit machines), and if that doesn't work out there's
>>> Sophia (http://sophia.systems/). I'll be evaluating both as a
>>> replacement to LMDB.
>> Do we really want to maintain a own DB system?
>> IMHO that will never work out, all DB systems around need more maintenance power
>> than we have.
> 
> This is something I'm not sure about. The DB will be build anyway, my
> graduation depends on it :D And if I'm going to do something I will do
> it well, so it'll be simple and clean.
I don't doubt that you are capable to write clean and working code.

The only problem is: there is a big difference between a academic implementation
and a product ready thing. Any existing key value database that is usable
for general consumption is a multi man year effort, even if you start today,
that is a solution we can use in some years, if at all.

Actually the most work is to handle all different environments and corner cases,
which is something that more or less can only be done by getting feedback
over several years, and I doubt we want to incubate a new DB in baloo as playground
on our user production machines.

> 
> If it doesn't work out, there's always Sophia to fall back on.
Sophia is again designed to be used in server environments, just from their start page:

"For server environment, which requires lowest latency access (both read and write), predictable behaviour, optimized storage schema and transaction guarantees."

This means, like lmdb, most likely (at least google doesn't tell that it will do it) real
usable for nfs (or other network) home mounts, which is very common on large scale installations.

(sophia doesn't get away to well after the opinion of the lmdb author, too: https://www.mail-archive.com/cyrus-devel@lists.andrew.cmu.edu/msg03653.html)

> 
>>> Vishesh also wanted to separate out the engine and make it public API
>>> (apparently other projects want to make use of it as a general data
>>> storage library - and the engine offers fulltext search capabilities
>>> and other fancy logical operators that make it particularly
>>> attractive. My plan is to move towards that, and eventually also not
>>> only index files but also other kinds of objects - contacts, or
>>> people, for example.
>>>
>>> I don't want to move back into the "semantic desktop" idea at all, but
>>> I do want some sort of infrastructure that allows for an "action on
>>> object" metaphor - file objects can be opened with an application,
>>> people objects can be sent mails, and so on.
>>>
>>> Hope this makes sense.
>> I still not see how that should work out, atm, IMHO facts are:
>>
>> 1) baloo is not maintained
> 
> It will, now.
> 
>> 2) lmdb will e.g. never work for us on NFS homes and the code needs major
>> overhaul
>> to handle errors (which you confirm)
> 
> LMDB goes away, either way.
> 
>> 3) you said you have "some time" left to maintain it, but you now propose in
>> addition to maintain
>> Baloo to write a DB system from scratch, I don't really see that working
> 
> I have a personal interest, an academic interest, and now a
> KDE-related interest in the KVDB. It *will* work, because I'm the kind
> of guy who puts a lot of time and effort into things (maybe even
> disproportionately so) into things that genuinely interest me. My
> challenge will be to make the codebase so that after I'm done with
> this (say in about 5 years or so) it'll be comprehensible to the next
> maintainer.
As stated above, I don't doubt that your are capable and earnest and hard working.
But I don't see that we should prototype & develop a database, alone the work
on top of that (what baloo does atm) will take months to get right.

> 
>> 4) tracker on the other side is maintained and in use and we can share the index
>> data with GNOME and others
>>
>> I really doubt that doing the work to remove lmdb, replace it with an "own one"
>> and then starting
>> to fix all other issues (like indexer running amok, broken file extractors, ...)
>> will work out if
>> we don't clone some more people.
>>
>> But that is only my opinion.
> 
> *Sigh*
> 
> I don't want to take the easy way out here. Half the fun in KDE is
> doing crazy things and seeing your baby work. That's the entire
> motivation for being here.
> 
> And right now I'm volunteering to do this.
I appreciate that, I only would like to avoid to have once more a indexer/search
that starts from scratch and is left unmaintained.

We had strigi based stuff, we had nepomuk and now we have baloo, all more or less
from scratch and all ended up unmaintained and underdocumented (strigi actually had
more docs I remember ;=).

Actually, I don't insist on a tracker based solution, but I would like to have
some that doesn't end up in "KDE reinvents the wheel" once more if there are perhaps
alternatives available.

Beside the speed argument, I got nothing that e.g. is a contra again using tracker
in this mail.

(David had some good points in his mail, e.g. if sqlite doesn't have nfs probs, yes, if locking
is broken, or how to migrate the xattr data, which is not solved yet)

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234


More information about the Kde-frameworks-devel mailing list