Scrap Baloo Thread Feedback

Christoph Cullmann cullmann at absint.com
Sun Oct 16 12:16:34 UTC 2016


Hi,

(evil top posting)

given the silence, I assume any interest in baloo has stopped once more, or?
Or are there any plans how to fixup the current situation?

Greetings
Christoph

----- Am 7. Okt 2016 um 20:08 schrieb cullmann cullmann at absint.com:

> Hi,
> 
>> Hey
>> 
>> On Fri, Oct 7, 2016 at 6:34 PM, Christoph Cullmann <cullmann at absint.com> wrote:
>>> Hi,
>>>
>>>> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <cullmann at absint.com> wrote:
>>>>>
>>>
>>> 1) No handling of DB errors beside asserting
>>> 2) No handling of errors in the extractors (e.g. see the fixes I did, all
>>> extractors will need more of that)
>>> 3) No handling of NFS/large inodes/inconsistencies => crash
>>>
>>> In the end, in my opinion, you can rewrite close to all parts dealing with the
>>> DB or
>>> any other thing internally. If ever any thing gots inconsistent, ATM you are
>>> doomed, forever,
>>> if not by luck my new startup code deletes the index, then you live again until
>>> it is reindexed.
>>>
>>>>
>>> I am not sure, I am all for removing complete indexing and use a other indexer
>>> like tracker to exactly avoid the excurse into DB world and how to handle it
>>> in a safe way with close to zero person manpower.
>>>
>> 
>> It's avoiding the problem and hoping for the best, without any experiments.
> That is not true.
> 
> I did experiments and search works with tracker, but yes, a problem is tagging,+
> which ATM doesn't work. Nor do I say that is a ready solution now, just a
> possibility
> to avoid having to maintain low level code with at most 1 person (how it looks
> ATM).
> 
> And I don't propose to go that road now, but ATM I see nobody doing any other
> experiments.
> 
> Besides, tracker is constantly maintained and used since >> 5 years:
> 
> https://github.com/GNOME/tracker/graphs/contributors
> 
>> 
>>>
>>> => That is good that we agree, but I find it very astonishing that we use baloo
>>> in its
>>> current state more or less mandatory on all that systems were it by design will
>>> fail.
>>>
>>> (and it fails if you read the bugs)
>>>
>> 
>> There is a certain amount of failure, but it's not "by-design". But
>> maybe I'm not seeing things clearly.
> You yourself stated that neither 32-bit issues nor NFS nor > 32-bit inodes have
> any
> error handling. And that seems to have been known even during design and still
> we have this now as a framework per default used by any Plasma installation on
> systems exactly featuring that without error checking.
> 
>> 
>>>>
>>>>>>
>>>>>> How about requirements such as resource consumption, ease of
>>>>>> integration, search speed are taken into consideration? Come on guys.
>>>>>> We're engineers over here.
>> 
>>>>> What is the argument here? If you take a look at bugs.kde.org, you see that
>>>>> people are complaining about all
>>>>> of that with baloo. I see no evidence nowhere that e.g. baloo is "superior" to
>>>>> what GNOME uses
>>>>> or any other solution (perhaps beside nepomuk, ok...).
>> 
>> What tests have been to obtain the evidence?
> What tests have been done to obtain the inverse evidence? I only hear here the
> complaint
> about not taking requirements like resource consumption or speed into account,
> but
> there is ATM zero evidence that e.g. tracker is slower.
> 
> And yes, there are "it hogs" 100% memory or time bugs open, thought you can
> hardly reproduce them
> as people are somehow scared to pack their home and send it to us. Not that a
> lot of that bugs
> got touched at all in Bugzilla.
> 
>> 
>>>
>>>>
>>>> Yup, you have. It's awesome. I no longer have the motivation to work on Baloo.
>>> Thanks, but that makes me very sad, btw.
>>> Baloo came up to replace nepomuk, which was dead because it had too many bugs
>>> and all maintainers left.
>>> Now we have baloo, which has many bugs, some even by design, and the maintainer
>>> left, too.
>>>
>> 
>> Actually, Nepomuk was not dead. I was maintaining it. I killed it
>> because it had too many structural problems.
>> 
>> This is how the open source world works. People work on projects and
>> when it no longer scratches their itch (I no longer use Baloo), they
>> loose interest. This is "supposed" to be a hobby.
> That is ok, to see it as hobby.
> 
> But I am a bit unnerved that one proposes this as the generic index solution
> for our desktop, which should be stable, if nothing else, and knows that it has
> severe
> limitations that are not handled (see above). I would have assumed that at least
> the known "can't work here'
> cases are handled in a graceful way.
> 
> And given already one of the first things main.cpp of baloo_file does is:
> 
>    // HACK: Untill we start using lmdb with robust mutex support. We're just going
>    to remove
>    //       the lock manually in the baloo_file process.
>    QFile::remove(path + "/index-lock");
> 
> that doesn't leave high hopes, sorry.
> 
> And the typical error check is:
> 
> void MTimeDB::put(quint32 mtime, quint64 docId)
> {
>    Q_ASSERT(mtime > 0);
>    Q_ASSERT(docId > 0);
> 
>    MDB_val key;
>    key.mv_size = sizeof(quint32);
>    key.mv_data = static_cast<void*>(&mtime);
> 
>    MDB_val val;
>    val.mv_size = sizeof(quint64);
>    val.mv_data = static_cast<void*>(&docId);
> 
>    int rc = mdb_put(m_txn, m_dbi, &key, &val, 0);
>    Q_ASSERT_X(rc == 0, "MTimeDB::put", mdb_strerror(rc));
> }
> 
> without any way to pass an error to the outside, nor any error handling code at
> the outside,
> as no error can ever occur that is non-fatal.
> 
>> 
>>>
>>>> (This is why they run on a separate process)
>>> That doesn't help, it just OOMs your system => dead, it needs resource
>>> restrictions,
>>> which is tricky to get right.
>>>
>> 
>> You're right. It needs a better thought out solution. A separate
>> process is the bare minimum.
>> 
>> Btw, have you looked if Tracker actually does any of this?
> It has process separation and it handles crashs well enough to not screw up
> client process queries. And it has maintained extractors or miners, unlike us.
> But for sure, it has bugs and crashs and all things, but it is maintained and
> has a
> constant stream of fixes for a longer time than baloo + all predecessors
> together.
> 
>> 
>>>> My hostility was because the proposal ignores key points such as -
>>>>
>>>> * Indexing Speed
>>>> * Search speed
>>>> * Database size
>>> => If you look at the bugs, people complain we are inferior and I see not
>>> that the proposal ignores it, I just see not how to compare, given there are no
>>> hard facts that we are faster than e.g. tracker in any way.
>>>
>> 
>> Data can be gathered about it. Not all data is publicly available.
> That would make any decision easier to take.
> 
>> 
>>>> * Ease of use with our existing components
>>> My proposal did not change the interface at all, it has zero impact on "ease of
>>> use".
>>>
>>>> * Ease of fixing problems in the code
>>> My estimate would be: rewrite close to everything. Even the basic 64-bit int id
>>> won't work
>>> with 64-bit inodes, each DB call must be touched to check for errors, at each
>>> place
>>> one will need to check for potential inconsistencies and exit gracefully...
>>>
>> 
>> I don't follow why everything needs to be re-written? Am I missing
>> something or do we just need to check for more errors and use a higher
>> integer id? This certainly doesn't seem super trivial, but it sounds
>> like less work than implementing a shim on top of Tracker.
> If you look at your own code, you will see, that there is no error handling at
> all,
> beside asserts. (see above)
> 
> There is not even the concept of pass an error out to higher levels.
> 
> Perhaps I am wrong, because there is only a bit of documentation in addition,
> but if you start to add error handling at the DB calls, you can start to rewrite
> all internal layers.
> 
> Besides I don't see any documentation of the DB format, but I could miss that.
> (at least not in the git nor https://community.kde.org/Baloo)
> 
>> 
>> I could be wrong.
> So coulbe be me ;=)
> 
>> 
>>>>
>>>> Baloo has certain speed requirements if it is to be used with krunner,
>>>> and we want instant feedback. This was an integral requirement.
>>> I doubt e.g. tracker has different requirements, as it is used in similar places
>>> by GNOME.
>>>
>>> But all that left besides, have you an proposal how to fixup the current
>>> situation?
>>> Are you willing to invest some work to fix the current issues or an idea what
>>> would be a good way to tackle them?
>>>
>> 
>> I probably will not work more in Baloo.
>> 
>> I'll have to investigate the problems a bit more. From the cursory
>> look of this thread, it doesn't seem that the problems are that dire.
>> But I may not be reading into it correctly.
> What would be highly appreciated would be a bit of documentation what the
> different pieces do and stuff like that, even if you have no time to code.
> 
> Greetings
> Christoph
> 
> --
> ----------------------------- Dr.-Ing. Christoph Cullmann ---------
> AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
> Science Park 1                         Tel:   +49-681-38360-22
> 66123 Saarbrücken                      Fax:   +49-681-38360-20
> GERMANY                                WWW:   http://www.AbsInt.com
> --------------------------------------------------------------------
> Geschäftsführung: Dr.-Ing. Christian Ferdinand
> Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234


More information about the Kde-frameworks-devel mailing list