<div dir="ltr">Hi,<div><br></div><div>Unfortunately I've been hit my multiple pretty severe health scares in the last month, and have no idea when I'm going to be at 100% again.</div><div><br></div><div>For the time being I'd rather not hold up any development, so don't hold back anything on my account.</div><div><br></div><div>-- Boudhayan</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 16 October 2016 at 17:46, Christoph Cullmann <span dir="ltr"><<a href="mailto:cullmann@absint.com" target="_blank">cullmann@absint.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
(evil top posting)<br>
<br>
given the silence, I assume any interest in baloo has stopped once more, or?<br>
Or are there any plans how to fixup the current situation?<br>
<br>
Greetings<br>
<span class="HOEnZb"><font color="#888888">Christoph<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
----- Am 7. Okt 2016 um 20:08 schrieb cullmann <a href="mailto:cullmann@absint.com">cullmann@absint.com</a>:<br>
<br>
> Hi,<br>
><br>
>> Hey<br>
>><br>
>> On Fri, Oct 7, 2016 at 6:34 PM, Christoph Cullmann <<a href="mailto:cullmann@absint.com">cullmann@absint.com</a>> wrote:<br>
>>> Hi,<br>
>>><br>
>>>> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <<a href="mailto:cullmann@absint.com">cullmann@absint.com</a>> wrote:<br>
>>>>><br>
>>><br>
>>> 1) No handling of DB errors beside asserting<br>
>>> 2) No handling of errors in the extractors (e.g. see the fixes I did, all<br>
>>> extractors will need more of that)<br>
>>> 3) No handling of NFS/large inodes/inconsistencies => crash<br>
>>><br>
>>> In the end, in my opinion, you can rewrite close to all parts dealing with the<br>
>>> DB or<br>
>>> any other thing internally. If ever any thing gots inconsistent, ATM you are<br>
>>> doomed, forever,<br>
>>> if not by luck my new startup code deletes the index, then you live again until<br>
>>> it is reindexed.<br>
>>><br>
>>>><br>
>>> I am not sure, I am all for removing complete indexing and use a other indexer<br>
>>> like tracker to exactly avoid the excurse into DB world and how to handle it<br>
>>> in a safe way with close to zero person manpower.<br>
>>><br>
>><br>
>> It's avoiding the problem and hoping for the best, without any experiments.<br>
> That is not true.<br>
><br>
> I did experiments and search works with tracker, but yes, a problem is tagging,+<br>
> which ATM doesn't work. Nor do I say that is a ready solution now, just a<br>
> possibility<br>
> to avoid having to maintain low level code with at most 1 person (how it looks<br>
> ATM).<br>
><br>
> And I don't propose to go that road now, but ATM I see nobody doing any other<br>
> experiments.<br>
><br>
> Besides, tracker is constantly maintained and used since >> 5 years:<br>
><br>
> <a href="https://github.com/GNOME/tracker/graphs/contributors" rel="noreferrer" target="_blank">https://github.com/GNOME/<wbr>tracker/graphs/contributors</a><br>
><br>
>><br>
>>><br>
>>> => That is good that we agree, but I find it very astonishing that we use baloo<br>
>>> in its<br>
>>> current state more or less mandatory on all that systems were it by design will<br>
>>> fail.<br>
>>><br>
>>> (and it fails if you read the bugs)<br>
>>><br>
>><br>
>> There is a certain amount of failure, but it's not "by-design". But<br>
>> maybe I'm not seeing things clearly.<br>
> You yourself stated that neither 32-bit issues nor NFS nor > 32-bit inodes have<br>
> any<br>
> error handling. And that seems to have been known even during design and still<br>
> we have this now as a framework per default used by any Plasma installation on<br>
> systems exactly featuring that without error checking.<br>
><br>
>><br>
>>>><br>
>>>>>><br>
>>>>>> How about requirements such as resource consumption, ease of<br>
>>>>>> integration, search speed are taken into consideration? Come on guys.<br>
>>>>>> We're engineers over here.<br>
>><br>
>>>>> What is the argument here? If you take a look at <a href="http://bugs.kde.org" rel="noreferrer" target="_blank">bugs.kde.org</a>, you see that<br>
>>>>> people are complaining about all<br>
>>>>> of that with baloo. I see no evidence nowhere that e.g. baloo is "superior" to<br>
>>>>> what GNOME uses<br>
>>>>> or any other solution (perhaps beside nepomuk, ok...).<br>
>><br>
>> What tests have been to obtain the evidence?<br>
> What tests have been done to obtain the inverse evidence? I only hear here the<br>
> complaint<br>
> about not taking requirements like resource consumption or speed into account,<br>
> but<br>
> there is ATM zero evidence that e.g. tracker is slower.<br>
><br>
> And yes, there are "it hogs" 100% memory or time bugs open, thought you can<br>
> hardly reproduce them<br>
> as people are somehow scared to pack their home and send it to us. Not that a<br>
> lot of that bugs<br>
> got touched at all in Bugzilla.<br>
><br>
>><br>
>>><br>
>>>><br>
>>>> Yup, you have. It's awesome. I no longer have the motivation to work on Baloo.<br>
>>> Thanks, but that makes me very sad, btw.<br>
>>> Baloo came up to replace nepomuk, which was dead because it had too many bugs<br>
>>> and all maintainers left.<br>
>>> Now we have baloo, which has many bugs, some even by design, and the maintainer<br>
>>> left, too.<br>
>>><br>
>><br>
>> Actually, Nepomuk was not dead. I was maintaining it. I killed it<br>
>> because it had too many structural problems.<br>
>><br>
>> This is how the open source world works. People work on projects and<br>
>> when it no longer scratches their itch (I no longer use Baloo), they<br>
>> loose interest. This is "supposed" to be a hobby.<br>
> That is ok, to see it as hobby.<br>
><br>
> But I am a bit unnerved that one proposes this as the generic index solution<br>
> for our desktop, which should be stable, if nothing else, and knows that it has<br>
> severe<br>
> limitations that are not handled (see above). I would have assumed that at least<br>
> the known "can't work here'<br>
> cases are handled in a graceful way.<br>
><br>
> And given already one of the first things main.cpp of baloo_file does is:<br>
><br>
> // HACK: Untill we start using lmdb with robust mutex support. We're just going<br>
> to remove<br>
> // the lock manually in the baloo_file process.<br>
> QFile::remove(path + "/index-lock");<br>
><br>
> that doesn't leave high hopes, sorry.<br>
><br>
> And the typical error check is:<br>
><br>
> void MTimeDB::put(quint32 mtime, quint64 docId)<br>
> {<br>
> Q_ASSERT(mtime > 0);<br>
> Q_ASSERT(docId > 0);<br>
><br>
> MDB_val key;<br>
> key.mv_size = sizeof(quint32);<br>
> key.mv_data = static_cast<void*>(&mtime);<br>
><br>
> MDB_val val;<br>
> val.mv_size = sizeof(quint64);<br>
> val.mv_data = static_cast<void*>(&docId);<br>
><br>
> int rc = mdb_put(m_txn, m_dbi, &key, &val, 0);<br>
> Q_ASSERT_X(rc == 0, "MTimeDB::put", mdb_strerror(rc));<br>
> }<br>
><br>
> without any way to pass an error to the outside, nor any error handling code at<br>
> the outside,<br>
> as no error can ever occur that is non-fatal.<br>
><br>
>><br>
>>><br>
>>>> (This is why they run on a separate process)<br>
>>> That doesn't help, it just OOMs your system => dead, it needs resource<br>
>>> restrictions,<br>
>>> which is tricky to get right.<br>
>>><br>
>><br>
>> You're right. It needs a better thought out solution. A separate<br>
>> process is the bare minimum.<br>
>><br>
>> Btw, have you looked if Tracker actually does any of this?<br>
> It has process separation and it handles crashs well enough to not screw up<br>
> client process queries. And it has maintained extractors or miners, unlike us.<br>
> But for sure, it has bugs and crashs and all things, but it is maintained and<br>
> has a<br>
> constant stream of fixes for a longer time than baloo + all predecessors<br>
> together.<br>
><br>
>><br>
>>>> My hostility was because the proposal ignores key points such as -<br>
>>>><br>
>>>> * Indexing Speed<br>
>>>> * Search speed<br>
>>>> * Database size<br>
>>> => If you look at the bugs, people complain we are inferior and I see not<br>
>>> that the proposal ignores it, I just see not how to compare, given there are no<br>
>>> hard facts that we are faster than e.g. tracker in any way.<br>
>>><br>
>><br>
>> Data can be gathered about it. Not all data is publicly available.<br>
> That would make any decision easier to take.<br>
><br>
>><br>
>>>> * Ease of use with our existing components<br>
>>> My proposal did not change the interface at all, it has zero impact on "ease of<br>
>>> use".<br>
>>><br>
>>>> * Ease of fixing problems in the code<br>
>>> My estimate would be: rewrite close to everything. Even the basic 64-bit int id<br>
>>> won't work<br>
>>> with 64-bit inodes, each DB call must be touched to check for errors, at each<br>
>>> place<br>
>>> one will need to check for potential inconsistencies and exit gracefully...<br>
>>><br>
>><br>
>> I don't follow why everything needs to be re-written? Am I missing<br>
>> something or do we just need to check for more errors and use a higher<br>
>> integer id? This certainly doesn't seem super trivial, but it sounds<br>
>> like less work than implementing a shim on top of Tracker.<br>
> If you look at your own code, you will see, that there is no error handling at<br>
> all,<br>
> beside asserts. (see above)<br>
><br>
> There is not even the concept of pass an error out to higher levels.<br>
><br>
> Perhaps I am wrong, because there is only a bit of documentation in addition,<br>
> but if you start to add error handling at the DB calls, you can start to rewrite<br>
> all internal layers.<br>
><br>
> Besides I don't see any documentation of the DB format, but I could miss that.<br>
> (at least not in the git nor <a href="https://community.kde.org/Baloo" rel="noreferrer" target="_blank">https://community.kde.org/<wbr>Baloo</a>)<br>
><br>
>><br>
>> I could be wrong.<br>
> So coulbe be me ;=)<br>
><br>
>><br>
>>>><br>
>>>> Baloo has certain speed requirements if it is to be used with krunner,<br>
>>>> and we want instant feedback. This was an integral requirement.<br>
>>> I doubt e.g. tracker has different requirements, as it is used in similar places<br>
>>> by GNOME.<br>
>>><br>
>>> But all that left besides, have you an proposal how to fixup the current<br>
>>> situation?<br>
>>> Are you willing to invest some work to fix the current issues or an idea what<br>
>>> would be a good way to tackle them?<br>
>>><br>
>><br>
>> I probably will not work more in Baloo.<br>
>><br>
>> I'll have to investigate the problems a bit more. From the cursory<br>
>> look of this thread, it doesn't seem that the problems are that dire.<br>
>> But I may not be reading into it correctly.<br>
> What would be highly appreciated would be a bit of documentation what the<br>
> different pieces do and stuff like that, even if you have no time to code.<br>
><br>
> Greetings<br>
> Christoph<br>
><br>
> --<br>
> ----------------------------- Dr.-Ing. Christoph Cullmann ---------<br>
> AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com<br>
> Science Park 1 Tel: +49-681-38360-22<br>
> 66123 Saarbrücken Fax: +49-681-38360-20<br>
> GERMANY WWW: <a href="http://www.AbsInt.com" rel="noreferrer" target="_blank">http://www.AbsInt.com</a><br>
> ------------------------------<wbr>------------------------------<wbr>--------<br>
> Geschäftsführung: Dr.-Ing. Christian Ferdinand<br>
> Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234<br>
<br>
--<br>
----------------------------- Dr.-Ing. Christoph Cullmann ---------<br>
AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com<br>
Science Park 1 Tel: +49-681-38360-22<br>
66123 Saarbrücken Fax: +49-681-38360-20<br>
GERMANY WWW: <a href="http://www.AbsInt.com" rel="noreferrer" target="_blank">http://www.AbsInt.com</a><br>
------------------------------<wbr>------------------------------<wbr>--------<br>
Geschäftsführung: Dr.-Ing. Christian Ferdinand<br>
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234<br>
</div></div></blockquote></div><br></div>