Scrap Baloo Thread Feedback

Christoph Cullmann cullmann at absint.com
Fri Oct 7 16:34:08 UTC 2016


Hi,

> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <cullmann at absint.com> wrote:
>> FYI, as my mail is in moderation queue on kde-devel
>>
>> ----- Weitergeleitete Mail -----
>> Von: "cullmann" <cullmann at absint.com>
>> An: "kde-frameworks-devel" <kde-frameworks-devel at kde.org>
>> CC: "kde-devel" <kde-devel at kde.org>
>> Gesendet: Freitag, 7. Oktober 2016 17:56:35
>> Betreff: Re: Scrap Baloo Thread Feedback
>>
>> Hi,
>>
>>> Hey guys
>>>
>>> I was told there is a thread about scrapping Baloo. All Baloo
>>> discussion used to happen on kde-devel and that's where the review
>>> requests go. It's the only reason I am still subscribed to kde-devel.
>> That is nice, but given baloo is a framework, that was unexpected, sorry.
>>
>>>
>>> I must say, the thread is overall quite disappointing. There seems to
>>> be no scientific or rationale cost based analysis of this. How about a
>>> list of requirements and priorities are drawn up and then possible
>>> solutions are evaluated according to it?
>>
>> Actually, the bugs.kde.org page tells you the facts: The bug number
>> was constant increasing since > 1 year. The thread lists some other facts
>> what is wrong ATM and should be fixed.
>>
> 
> It lists some of the facts. Not all.
> 
> Of course the bug number is increasing. I am no longer pruning it. Are
> the number of unique bugs increasing? What kind of users are being
> affected by these bugs? Is it just people with specific kinds of
> files? There is a lot of information that the bug tracker does not
> cover. Lets also take the uncertainty into account, and then try to
> mitigate it.
In the end, most bugs boil down to:

1) No handling of DB errors beside asserting
2) No handling of errors in the extractors (e.g. see the fixes I did, all extractors will need more of that)
3) No handling of NFS/large inodes/inconsistencies => crash

In the end, in my opinion, you can rewrite close to all parts dealing with the DB or
any other thing internally. If ever any thing gots inconsistent, ATM you are doomed, forever,
if not by luck my new startup code deletes the index, then you live again until it is reindexed.

> 
>> And to replace baloo with something else based for example on tracker was just
>> one
>> proposal.
>>
>> An other was to fix baloo + port it to an other database.
> 
> Right, "another database".
> 
> Typically one would expect the problems and features of our current
> database to be evaluated against the others. This was an exercise that
> I did and chose LMDB.
> 
> What are the requirements for the database?
I am not sure, I am all for removing complete indexing and use a other indexer
like tracker to exactly avoid the excurse into DB world and how to handle it
in a safe way with close to zero person manpower.

And I oppose the idea to write an own DB.

> 
>>
>>>
>>> Right now, random requirements such as NFS and 32bit systems are
>>> coming up. Are these really that important? I specifically designed
>>> Baloo to not care about both network mounts and 32-bit systems. Yes,
>>> Baloo has bugs and it won't handle more than 32bit-inodes. These
>>> things, as all others, can be fixed. It's really a question of what is
>>> important. Lets not target the outliers. Many of these decisions were
>>> deliberately taken.
>> That are no random requirements, sorry, you could call it random restrictions,
>> too.
>> That is not that productive, or?
>>
>> 1) 32-bit systems are still there and if that is a design decision to NOT
>> support them,
>> that is ok, but then bad for Plasma, no official support for 32-bit systems,
>> baloo is IMHO
>> the only framework with such requirements. And I see not that we have hinted any
>> distro
>> that they shall not compile it for 32-bit.
>>
> 
> * 32-bit systems can be supported. But it will be much inferior as the
> database size will need to be limited. I never got around to doing
> this.
> * Plasma still supports 32-bit systems, but file indexing may be
> limited. It's the same way that compositing may be disabled if you
> have old hardware.
> * All that being a frameworks means is that there are ABI / API
> guarantees and a release schedule. Not all frameworks target all
> systems.
> 
>> 2) No NFS: Ok, fair game, but then, it should check that and disable itself
>> completely if $HOME
>> where the db is stored is a NFS, can live with that, too, but not with the
>> current "we random
>> crash" behavior. => That is a user experience we don't want, or?
>>
> 
> Correct. That can be implemented. This is just a matter of priorities.
> 
>> 3) > 32-bit inodes: That is normal and should work, but even if it should not:
>> Atm you get inconsistent
>> and then later assertion fails or crashs.
>>
>> => I can live with all restrictions but the current handling of them, that
>> always ends in "crash" is
>> IMHO not that acceptable. But that is "my" opinion, that might vary in the eyes
>> of others.
>>
> 
> I agree. It should not crash. Baloo doesn't handle errors as well as it should.

=> That is good that we agree, but I find it very astonishing that we use baloo in its
current state more or less mandatory on all that systems were it by design will fail.

(and it fails if you read the bugs)

> 
>>>
>>> How about requirements such as resource consumption, ease of
>>> integration, search speed are taken into consideration? Come on guys.
>>> We're engineers over here.
>> What is the argument here? If you take a look at bugs.kde.org, you see that
>> people are complaining about all
>> of that with baloo. I see no evidence nowhere that e.g. baloo is "superior" to
>> what GNOME uses
>> or any other solution (perhaps beside nepomuk, ok...).
>>
>> I fixed in a few days more bugs than were fixed in 1 year and triaged more than
>> ever, still a lot is to be done.
>> (and I did really not do a lot, just remove things like 'self destruct if index
>> > 5GB' or 'crash for ever on
>> db corruption')
>>
>> A graph tells more than words:
>>
>> https://bugs.kde.org/reports.cgi?product=frameworks-baloo&output=show_chart&datasets=CONFIRMED&datasets=ASSIGNED&datasets=REOPENED&datasets=UNCONFIRMED&datasets=RESOLVED&banner=1
>>
> 
> Yup, you have. It's awesome. I no longer have the motivation to work on Baloo.
Thanks, but that makes me very sad, btw.
Baloo came up to replace nepomuk, which was dead because it had too many bugs and all maintainers left.
Now we have baloo, which has many bugs, some even by design, and the maintainer left, too.

> 
>> Given the current open bugs, one will need to:
>>
>> 1) review all extractors, they have still close to zero error handling and will
>> just crash or OOM you on bad files
> 
> (This is why they run on a separate process)
That doesn't help, it just OOMs your system => dead, it needs resource restrictions,
which is tricky to get right.

> 
>> 2) review + fix the complete data base handling to handle errors and perhaps
>> swap the DB
>> 3) fix the indexer to have some resource limits to avoid OOM and Co. if e..g
>> extractors fail
>> ...
>>
>> Therefore there was my proposal, given we lack manpower, to implement baloo API
>> on top of e.g. tracker to avoid all this
>> and let tracker handle that.
>>
>> To check if that is at all feasible, I did some quick and dirty implementation
>> (still modulo filling of the metadata in the results + tagging,
>> which is a problem, but that was only to see if e.g. search works)
>>
>> https://quickgit.kde.org/?p=clones%2Fbaloo%2Fcullmann%2Ftbaloo.git
>>
>> That is just a proposal and then I started the discussion.
>>
> 
> My hostility was because the proposal ignores key points such as -
> 
> * Indexing Speed
> * Search speed
> * Database size
=> If you look at the bugs, people complain we are inferior and I see not
that the proposal ignores it, I just see not how to compare, given there are no
hard facts that we are faster than e.g. tracker in any way.

> * Ease of use with our existing components
My proposal did not change the interface at all, it has zero impact on "ease of use".

> * Ease of fixing problems in the code
My estimate would be: rewrite close to everything. Even the basic 64-bit int id won't work
with 64-bit inodes, each DB call must be touched to check for errors, at each place
one will need to check for potential inconsistencies and exit gracefully...

> 
> Baloo has certain speed requirements if it is to be used with krunner,
> and we want instant feedback. This was an integral requirement.
I doubt e.g. tracker has different requirements, as it is used in similar places by GNOME.

But all that left besides, have you an proposal how to fixup the current situation?
Are you willing to invest some work to fix the current issues or an idea what
would be a good way to tackle them?

> 
>> Until now, we have one other proposal, by Boudhayan, to fixup baloo.
>>
>>>
>>> (If the discussion continues on kde-frameworks-devel, I probably won't see it)
>>
>> I won't see it on kde-devel, please, frameworks related stuff should really
>> be discussed on the frameworks list.
>>
> 
> I don't agree with premise that all frameworks should be on one
> mailing list. (Some parts of plasma and activities are also
> frameworks)
> 
> Anyway, I'll setup a reminder to check this thread on the archives
> every couple of days.

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234


More information about the Kde-frameworks-devel mailing list