baloo + out-of-memory in file extractor

Sat Sep 10 18:16:07 UTC 2016

Hi,

> Hi,
> 
> On 10 Sep 2016 19:42, "Christoph Cullmann" <cullmann at absint.com> wrote:
>>
>> Hi,
>>
>> during the last night of the akademy for me in Berlin, baloo file
> extractor OOM'd my notebook
>> during extracting the tag info of the taglib unit test test-files.
> 
> Is it possible to reproduce the issue and get a stack trace?
> 
> If you can consistently repro this, running under valgrind to check for
> leaks will also go a long way.

https://github.com/taglib/taglib/blob/master/tests/data/toc_many_children.mp3

=> I guess it is even because of that a regression test ;=)
Perhaps my taglib is too old, in any case, see the answer below, fixing this is nice,
but not the solution.

> 
>>
>> Would it be a good idea to restrict the file extractor process to some
> fixed amount of memory
>> to use via setrlimit? (or more fancy stuff?)
> 
> That would probably just make Baloo crash, so fixing the bug is probably
> the better option.
Actually, that we don't limit the resources + sandbox baloo_file_indexer is the bug,
not that some meta info extractor is buggy (which should be fixed, too).

ATM, the state is:

1) baloo is on per default
2) it will index at least your home
3) if it encounters any "bad" file, it will OOM you, in my case in a way
that a normal user is doomed, as 1-2 seconds after login the machine is already halted.

Given that e.g. your "Downloads" might even contain "evil" files from the net,
at least some resource limit would be good and even better some sandbox, to avoid that
the indexer which is easily pwn'd pwn's your session.

Beside that, a real other problem is, that baloo has close to zero error handling for its
database, once one error happens, all further things will go down and never recover.

e.g. one time balooctl wrongly use => goodbye
https://bugs.kde.org/show_bug.cgi?id=368557

Interesting too: We use lmdb, which means, we memory map always, aka 32-bit machines will
be out of memory if you have large indices like > 2GB :/

I would like to help out with fixing this issues, but I think first some consensus would be needed
how we want to go on with that.

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234