Scrap Baloo Thread Feedback

Christoph Cullmann cullmann at absint.com
Fri Oct 7 18:08:17 UTC 2016


Hi,

> Hey
> 
> On Fri, Oct 7, 2016 at 6:34 PM, Christoph Cullmann <cullmann at absint.com> wrote:
>> Hi,
>>
>>> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <cullmann at absint.com> wrote:
>>>>
>>
>> 1) No handling of DB errors beside asserting
>> 2) No handling of errors in the extractors (e.g. see the fixes I did, all
>> extractors will need more of that)
>> 3) No handling of NFS/large inodes/inconsistencies => crash
>>
>> In the end, in my opinion, you can rewrite close to all parts dealing with the
>> DB or
>> any other thing internally. If ever any thing gots inconsistent, ATM you are
>> doomed, forever,
>> if not by luck my new startup code deletes the index, then you live again until
>> it is reindexed.
>>
>>>
>> I am not sure, I am all for removing complete indexing and use a other indexer
>> like tracker to exactly avoid the excurse into DB world and how to handle it
>> in a safe way with close to zero person manpower.
>>
> 
> It's avoiding the problem and hoping for the best, without any experiments.
That is not true.

I did experiments and search works with tracker, but yes, a problem is tagging,+
which ATM doesn't work. Nor do I say that is a ready solution now, just a possibility
to avoid having to maintain low level code with at most 1 person (how it looks ATM).

And I don't propose to go that road now, but ATM I see nobody doing any other experiments.

Besides, tracker is constantly maintained and used since >> 5 years:

https://github.com/GNOME/tracker/graphs/contributors

> 
>>
>> => That is good that we agree, but I find it very astonishing that we use baloo
>> in its
>> current state more or less mandatory on all that systems were it by design will
>> fail.
>>
>> (and it fails if you read the bugs)
>>
> 
> There is a certain amount of failure, but it's not "by-design". But
> maybe I'm not seeing things clearly.
You yourself stated that neither 32-bit issues nor NFS nor > 32-bit inodes have any
error handling. And that seems to have been known even during design and still
we have this now as a framework per default used by any Plasma installation on
systems exactly featuring that without error checking.

> 
>>>
>>>>>
>>>>> How about requirements such as resource consumption, ease of
>>>>> integration, search speed are taken into consideration? Come on guys.
>>>>> We're engineers over here.
> 
>>>> What is the argument here? If you take a look at bugs.kde.org, you see that
>>>> people are complaining about all
>>>> of that with baloo. I see no evidence nowhere that e.g. baloo is "superior" to
>>>> what GNOME uses
>>>> or any other solution (perhaps beside nepomuk, ok...).
> 
> What tests have been to obtain the evidence?
What tests have been done to obtain the inverse evidence? I only hear here the complaint
about not taking requirements like resource consumption or speed into account, but
there is ATM zero evidence that e.g. tracker is slower.

And yes, there are "it hogs" 100% memory or time bugs open, thought you can hardly reproduce them
as people are somehow scared to pack their home and send it to us. Not that a lot of that bugs
got touched at all in Bugzilla.

> 
>>
>>>
>>> Yup, you have. It's awesome. I no longer have the motivation to work on Baloo.
>> Thanks, but that makes me very sad, btw.
>> Baloo came up to replace nepomuk, which was dead because it had too many bugs
>> and all maintainers left.
>> Now we have baloo, which has many bugs, some even by design, and the maintainer
>> left, too.
>>
> 
> Actually, Nepomuk was not dead. I was maintaining it. I killed it
> because it had too many structural problems.
> 
> This is how the open source world works. People work on projects and
> when it no longer scratches their itch (I no longer use Baloo), they
> loose interest. This is "supposed" to be a hobby.
That is ok, to see it as hobby.

But I am a bit unnerved that one proposes this as the generic index solution
for our desktop, which should be stable, if nothing else, and knows that it has severe
limitations that are not handled (see above). I would have assumed that at least the known "can't work here'
cases are handled in a graceful way.

And given already one of the first things main.cpp of baloo_file does is:

    // HACK: Untill we start using lmdb with robust mutex support. We're just going to remove
    //       the lock manually in the baloo_file process.
    QFile::remove(path + "/index-lock");

that doesn't leave high hopes, sorry.

And the typical error check is:

void MTimeDB::put(quint32 mtime, quint64 docId)
{
    Q_ASSERT(mtime > 0);
    Q_ASSERT(docId > 0);
 
    MDB_val key;
    key.mv_size = sizeof(quint32);
    key.mv_data = static_cast<void*>(&mtime);
 
    MDB_val val;
    val.mv_size = sizeof(quint64);
    val.mv_data = static_cast<void*>(&docId);
 
    int rc = mdb_put(m_txn, m_dbi, &key, &val, 0);
    Q_ASSERT_X(rc == 0, "MTimeDB::put", mdb_strerror(rc));
}

without any way to pass an error to the outside, nor any error handling code at the outside,
as no error can ever occur that is non-fatal.

> 
>>
>>> (This is why they run on a separate process)
>> That doesn't help, it just OOMs your system => dead, it needs resource
>> restrictions,
>> which is tricky to get right.
>>
> 
> You're right. It needs a better thought out solution. A separate
> process is the bare minimum.
> 
> Btw, have you looked if Tracker actually does any of this?
It has process separation and it handles crashs well enough to not screw up
client process queries. And it has maintained extractors or miners, unlike us.
But for sure, it has bugs and crashs and all things, but it is maintained and has a
constant stream of fixes for a longer time than baloo + all predecessors together.

> 
>>> My hostility was because the proposal ignores key points such as -
>>>
>>> * Indexing Speed
>>> * Search speed
>>> * Database size
>> => If you look at the bugs, people complain we are inferior and I see not
>> that the proposal ignores it, I just see not how to compare, given there are no
>> hard facts that we are faster than e.g. tracker in any way.
>>
> 
> Data can be gathered about it. Not all data is publicly available.
That would make any decision easier to take.

> 
>>> * Ease of use with our existing components
>> My proposal did not change the interface at all, it has zero impact on "ease of
>> use".
>>
>>> * Ease of fixing problems in the code
>> My estimate would be: rewrite close to everything. Even the basic 64-bit int id
>> won't work
>> with 64-bit inodes, each DB call must be touched to check for errors, at each
>> place
>> one will need to check for potential inconsistencies and exit gracefully...
>>
> 
> I don't follow why everything needs to be re-written? Am I missing
> something or do we just need to check for more errors and use a higher
> integer id? This certainly doesn't seem super trivial, but it sounds
> like less work than implementing a shim on top of Tracker.
If you look at your own code, you will see, that there is no error handling at all,
beside asserts. (see above)

There is not even the concept of pass an error out to higher levels.

Perhaps I am wrong, because there is only a bit of documentation in addition,
but if you start to add error handling at the DB calls, you can start to rewrite
all internal layers.

Besides I don't see any documentation of the DB format, but I could miss that.
(at least not in the git nor https://community.kde.org/Baloo)

> 
> I could be wrong.
So coulbe be me ;=)

> 
>>>
>>> Baloo has certain speed requirements if it is to be used with krunner,
>>> and we want instant feedback. This was an integral requirement.
>> I doubt e.g. tracker has different requirements, as it is used in similar places
>> by GNOME.
>>
>> But all that left besides, have you an proposal how to fixup the current
>> situation?
>> Are you willing to invest some work to fix the current issues or an idea what
>> would be a good way to tackle them?
>>
> 
> I probably will not work more in Baloo.
> 
> I'll have to investigate the problems a bit more. From the cursory
> look of this thread, it doesn't seem that the problems are that dire.
> But I may not be reading into it correctly.
What would be highly appreciated would be a bit of documentation what the
different pieces do and stuff like that, even if you have no time to code.

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234


More information about the Kde-frameworks-devel mailing list