Baloo - Not Indexing everything by default

Vishesh Handa me at vhanda.in
Fri Oct 17 16:24:48 UTC 2014


On Thu, Oct 16, 2014 at 2:15 PM, Martin Gräßlin <mgraesslin at kde.org> wrote:

> the txt being genome data doesn't surprise me[1], but I find it sad that
> now
> txt is disabled by default (I use them quite a lot for blog posts). As
> genome
> data is really huge wouldn't it make sense to go rather for file size or
> abort
> the indexing if it's obvious random gibberish?
>

We currently have a hard limit of 50mb on 'text/plain' files. However this
does not include log files, which have a separate mimetype, Perhaps it
would really be good to reduce it to about 5 mb.

About gibberish. It's hard to figure out what gibberish is. I think I'll
add some code that we only index the first 20 characters of each word. That
should help to a certain extent.


-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20141017/1705a9ed/attachment-0001.html>


More information about the Plasma-devel mailing list