baloo + out-of-memory in file extractor

Christoph Cullmann cullmann at
Sat Sep 10 20:20:30 UTC 2016


> Hi Christoph,
> On 10 September 2016 at 23:46, Christoph Cullmann <cullmann at> wrote:
>>>> Would it be a good idea to restrict the file extractor process to some
>>> fixed amount of memory
>>>> to use via setrlimit? (or more fancy stuff?)
>>> That would probably just make Baloo crash, so fixing the bug is probably
>>> the better option.
>> Actually, that we don't limit the resources + sandbox baloo_file_indexer is the
>> bug,
>> not that some meta info extractor is buggy (which should be fixed, too).
>> ATM, the state is:
>> 1) baloo is on per default
>> 2) it will index at least your home
>> 3) if it encounters any "bad" file, it will OOM you, in my case in a way
>> that a normal user is doomed, as 1-2 seconds after login the machine is already
>> halted.
>> Given that e.g. your "Downloads" might even contain "evil" files from the net,
>> at least some resource limit would be good and even better some sandbox, to
>> avoid that
>> the indexer which is easily pwn'd pwn's your session.
> You've got a point there. In that case, what I'd do is:
> 1) Limit resources on baloo_file_extractor.
> 2) Try to detect if it crashes because it exceeded limits, not sure if
> this is easily possible.
> 3) Mark files causing such crashes as files that should be skipped,
> and the user notified somehow (?).
I think just mark any files as "skipped for the future" for which the indexer crashs.
(if by hinting resource limit or just other index fail doesn't really matter, IMHO,
beside that it perhaps should be logged somewhere)

Other problem: after indexer crash, the DB is corrupted or locked.

It seems one not really does any proper lmdb locking, baloo_file even just kills the lockfile
on startup.

>> Beside that, a real other problem is, that baloo has close to zero error
>> handling for its
>> database, once one error happens, all further things will go down and never
>> recover.
>> e.g. one time balooctl wrongly use => goodbye
>> Interesting too: We use lmdb, which means, we memory map always, aka 32-bit
>> machines will
>> be out of memory if you have large indices like > 2GB :/
> Nah, 32bit machines should have PAE. If they don't... I'm not willing
> to make fundamental changes to how indexes are kept to support edge
> cases like this. Disabling Baloo automatically if you detect machines
> with a 32-bit address space is the way to go.
PAE doesn't help there at all. (it only helps that your system can use more than 3/4GB,
not one application)

If you use lmdb, and lets say the file is 2GB, your applications 4GB of virtual
spaces is halfed (and even more, as some parts are anyways used otherwise).

Beside, other issue: ATM the index is fixed to max 5GB, after that, all things will fail,
see bug, as I have seen index sizes > 2 GB,
that will hit people, too.

We should increase that limit IMHO and out-of-space should be handled at all I guess.

> I'd still wait for Pinak to comment on all of the above though.


----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann at
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234

More information about the Kde-frameworks-devel mailing list