baloo + out-of-memory in file extractor

Boudhayan Gupta bgupta at kde.org
Sat Sep 10 19:19:59 UTC 2016


Hi Christoph,

On 10 September 2016 at 23:46, Christoph Cullmann <cullmann at absint.com> wrote:
>>> Would it be a good idea to restrict the file extractor process to some
>> fixed amount of memory
>>> to use via setrlimit? (or more fancy stuff?)
>>
>> That would probably just make Baloo crash, so fixing the bug is probably
>> the better option.
> Actually, that we don't limit the resources + sandbox baloo_file_indexer is the bug,
> not that some meta info extractor is buggy (which should be fixed, too).
>
> ATM, the state is:
>
> 1) baloo is on per default
> 2) it will index at least your home
> 3) if it encounters any "bad" file, it will OOM you, in my case in a way
> that a normal user is doomed, as 1-2 seconds after login the machine is already halted.
>
> Given that e.g. your "Downloads" might even contain "evil" files from the net,
> at least some resource limit would be good and even better some sandbox, to avoid that
> the indexer which is easily pwn'd pwn's your session.

You've got a point there. In that case, what I'd do is:

1) Limit resources on baloo_file_extractor.
2) Try to detect if it crashes because it exceeded limits, not sure if
this is easily possible.
3) Mark files causing such crashes as files that should be skipped,
and the user notified somehow (?).

> Beside that, a real other problem is, that baloo has close to zero error handling for its
> database, once one error happens, all further things will go down and never recover.
>
> e.g. one time balooctl wrongly use => goodbye
> https://bugs.kde.org/show_bug.cgi?id=368557
>
> Interesting too: We use lmdb, which means, we memory map always, aka 32-bit machines will
> be out of memory if you have large indices like > 2GB :/

Nah, 32bit machines should have PAE. If they don't... I'm not willing
to make fundamental changes to how indexes are kept to support edge
cases like this. Disabling Baloo automatically if you detect machines
with a 32-bit address space is the way to go.

I'd still wait for Pinak to comment on all of the above though.

> I would like to help out with fixing this issues, but I think first some consensus would be needed
> how we want to go on with that.

-- Boudhayan


More information about the Kde-frameworks-devel mailing list