<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 16, 2014 at 2:15 PM, Martin Gräßlin <span dir="ltr"><<a href="mailto:mgraesslin@kde.org" target="_blank">mgraesslin@kde.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":1kl" class="a3s" style="overflow:hidden">the txt being genome data doesn't surprise me[1], but I find it sad that now<br>

txt is disabled by default (I use them quite a lot for blog posts). As genome<br>

data is really huge wouldn't it make sense to go rather for file size or abort<br>

the indexing if it's obvious random gibberish?</div></blockquote></div><br>We currently have a hard limit of 50mb on 'text/plain' files. However this does not include log files, which have a separate mimetype, Perhaps it would really be good to reduce it to about 5 mb.</div><div class="gmail_extra"><br></div><div class="gmail_extra">About gibberish. It's hard to figure out what gibberish is. I think I'll add some code that we only index the first 20 characters of each word. That should help to a certain extent.<br><br clear="all"><div><br></div>-- <br><span style="color:rgb(192,192,192)">Vishesh Handa</span><br>

</div></div>