Scrap baloo?

Kevin Funk kfunk at kde.org
Fri Sep 30 06:36:24 UTC 2016


On Wednesday, 28 September 2016 23:38:08 CEST Christoph Cullmann wrote:
> Hi,
> 
> first of all: I appreciate all your work and don't want to attack you
> personally in any way, if my last mail felt that way, I am sorry!
> 
> > Hi,
> > 
> > On 28 September 2016 at 20:33, Christoph Cullmann <cullmann at absint.com> 
wrote:
> >> Hi,
> >> 
> >>> Hi,
> >>> 
> >>> On 28 September 2016 at 02:36, Christoph Cullmann <cullmann at absint.com> 
wrote:
> >>>> any update?
> >>> 
> >>> Yep. In all the happennings of the week I just forgot to write this
> >>> email.
> >>> 
> >>> If Baloo is going to be an integral part of the Plasma experience, do
> >>> we really want to depend on an external project where we don't have
> >>> control (and indeed, sentiments may prevent unrestricted contributions
> >>> based only on merit). This is the political reason why I don't want to
> >>> depend against Tracker. The technical reason is that it's based on
> >>> SQLite, which is incredibly slow compared to what we do now.
> >> 
> >> I don't see really that it is slow compared to what we do, if you have
> >> benchmarks
> >> for that, I would be pleased to see them.
> > 
> > So would I. You already have Tracker based code, could you spare some
> > time and run some?
> 
> Not really, I would not even known what to benchmark.
> But if you have no benchmarks available, I am interested to know why there
> is that idea we are that much faster? LMDB is fast, as key value storage,
> but we do not just lookup a key but do a lot more on top, that means only
> because we use a faster DB, we don't need to end up with faster overall
> performance than other solutions.
> >>> At the same time, LMDB needs to be replaced, and fast. I'm building a
> >>> new KVDB as an university project (it should be able to do 256GB
> >>> indexes on 32bit machines), and if that doesn't work out there's
> >>> Sophia (http://sophia.systems/). I'll be evaluating both as a
> >>> replacement to LMDB.
> >> 
> >> Do we really want to maintain a own DB system?
> >> IMHO that will never work out, all DB systems around need more
> >> maintenance power than we have.
> > 
> > This is something I'm not sure about. The DB will be build anyway, my
> > graduation depends on it :D And if I'm going to do something I will do
> > it well, so it'll be simple and clean.
> 
> I don't doubt that you are capable to write clean and working code.
> 
> The only problem is: there is a big difference between a academic
> implementation and a product ready thing. Any existing key value database
> that is usable for general consumption is a multi man year effort, even if
> you start today, that is a solution we can use in some years, if at all.
> 
> Actually the most work is to handle all different environments and corner
> cases, which is something that more or less can only be done by getting
> feedback over several years, and I doubt we want to incubate a new DB in
> baloo as playground on our user production machines.
> 
> > If it doesn't work out, there's always Sophia to fall back on.
> 
> Sophia is again designed to be used in server environments, just from their
> start page:
> 
> "For server environment, which requires lowest latency access (both read and
> write), predictable behaviour, optimized storage schema and transaction
> guarantees."
> 
> This means, like lmdb, most likely (at least google doesn't tell that it
> will do it) real usable for nfs (or other network) home mounts, which is
> very common on large scale installations.
> 
> (sophia doesn't get away to well after the opinion of the lmdb author, too:
> https://www.mail-archive.com/cyrus-devel@lists.andrew.cmu.edu/msg03653.html
> )
> >>> Vishesh also wanted to separate out the engine and make it public API
> >>> (apparently other projects want to make use of it as a general data
> >>> storage library - and the engine offers fulltext search capabilities
> >>> and other fancy logical operators that make it particularly
> >>> attractive. My plan is to move towards that, and eventually also not
> >>> only index files but also other kinds of objects - contacts, or
> >>> people, for example.
> >>> 
> >>> I don't want to move back into the "semantic desktop" idea at all, but
> >>> I do want some sort of infrastructure that allows for an "action on
> >>> object" metaphor - file objects can be opened with an application,
> >>> people objects can be sent mails, and so on.
> >>> 
> >>> Hope this makes sense.
> >> 
> >> I still not see how that should work out, atm, IMHO facts are:
> >> 
> >> 1) baloo is not maintained
> > 
> > It will, now.
> > 
> >> 2) lmdb will e.g. never work for us on NFS homes and the code needs major
> >> overhaul
> >> to handle errors (which you confirm)
> > 
> > LMDB goes away, either way.
> > 
> >> 3) you said you have "some time" left to maintain it, but you now propose
> >> in addition to maintain
> >> Baloo to write a DB system from scratch, I don't really see that working
> > 
> > I have a personal interest, an academic interest, and now a
> > KDE-related interest in the KVDB. It *will* work, because I'm the kind
> > of guy who puts a lot of time and effort into things (maybe even
> > disproportionately so) into things that genuinely interest me. My
> > challenge will be to make the codebase so that after I'm done with
> > this (say in about 5 years or so) it'll be comprehensible to the next
> > maintainer.
> 
> As stated above, I don't doubt that your are capable and earnest and hard
> working. But I don't see that we should prototype & develop a database,
> alone the work on top of that (what baloo does atm) will take months to get
> right.
> >> 4) tracker on the other side is maintained and in use and we can share
> >> the index data with GNOME and others
> >> 
> >> I really doubt that doing the work to remove lmdb, replace it with an
> >> "own one" and then starting
> >> to fix all other issues (like indexer running amok, broken file
> >> extractors, ...) will work out if
> >> we don't clone some more people.
> >> 
> >> But that is only my opinion.
> > 
> > *Sigh*
> > 
> > I don't want to take the easy way out here. Half the fun in KDE is
> > doing crazy things and seeing your baby work. That's the entire
> > motivation for being here.
> > 
> > And right now I'm volunteering to do this.

Just chiming in here since I got a little worried when reading there are some 
foggy plans to 'roll our own' KVDB...

> I appreciate that, I only would like to avoid to have once more a
> indexer/search that starts from scratch and is left unmaintained.

> We had strigi based stuff, we had nepomuk and now we have baloo, all more or
> less from scratch and all ended up unmaintained and underdocumented (strigi
> actually had more docs I remember ;=).

Yes.

Please, pretty please, don't reinvent the wheel here again, please don't 
consider an academic research project as production-ready replacement for a 
database backend. This is (sad) history repeating indeed.

There are alternatives for the DB at least, which work, are maintained (by 
more than one person) and where using them won't put another burden on us KDE 
developers who are lacking manpower in all different areas already.

> Actually, I don't insist on a tracker based solution, but I would like to
> have some that doesn't end up in "KDE reinvents the wheel" once more if
> there are perhaps alternatives available.

As Christoph I don't care about a specific solution, but going the NIH route 
sounds, by far, like the worst option. I'm not questioning Boudhayan's 
credibility to work out a great "draft" implementation of a KVDB for academic 
research... But, the *major* selling points of database implementations is a 
track record of being rock-stable in different environments for a continuous 
amount of time. There's no way you can guarantee this for a one-man academic 
research project.

Please reconsider the options.

Just 2c of a worried user, fearing even more Plasma workspace instabilities... 
Kevin

> Beside the speed argument, I got nothing that e.g. is a contra again using
> tracker in this mail.
> 
> (David had some good points in his mail, e.g. if sqlite doesn't have nfs
> probs, yes, if locking is broken, or how to migrate the xattr data, which
> is not solved yet)
> 
> Greetings
> Christoph


-- 
Kevin Funk | kfunk at kde.org | http://kfunk.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20160930/5bea2969/attachment.sig>


More information about the Kde-frameworks-devel mailing list