<table><tr><td style="">poboiko added a comment.
</td><a style="text-decoration: none; padding: 4px 8px; margin: 0 8px 8px; float: right; color: #464C5C; font-weight: bold; border-radius: 3px; background-color: #F7F7F9; background-image: linear-gradient(to bottom,#fff,#f1f0f1); display: inline-block; border: 1px solid rgba(71,87,120,.2);" href="https://phabricator.kde.org/D23787">View Revision</a></tr></table><br /><div><div><blockquote style="border-left: 3px solid #8C98B8;
color: #6B748C;
font-style: italic;
margin: 4px 0 12px 0;
padding: 8px 12px;
background-color: #F8F9FC;">
<div style="font-style: normal;
padding-bottom: 4px;">In <a href="https://phabricator.kde.org/D23787#537891" style="background-color: #e7e7e7;
border-color: #e7e7e7;
border-radius: 3px;
padding: 0 4px;
font-weight: bold;
color: black;text-decoration: none;">D23787#537891</a>, <a href="https://phabricator.kde.org/p/bruns/" style="
border-color: #f1f7ff;
color: #19558d;
background-color: #f1f7ff;
border: 1px solid transparent;
border-radius: 3px;
font-weight: bold;
padding: 0 4px;">@bruns</a> wrote:</div>
<div style="margin: 0;
padding: 0;
border: 0;
color: rgb(107, 116, 140);"><p>Can you please provide an example which:</p>
<ul class="remarkup-list">
<li class="remarkup-list-item">is currently indexed though it should be skipped due to size</li>
<li class="remarkup-list-item">is skipped after this change</li>
</ul></div>
</blockquote>
<p>Sure. Any mimetype inherited from "text/plain", but starting with "text/" counts. I've made an actual list:<br />
<a href="https://phabricator.kde.org/F7515259" style="background-color: #e7e7e7;
border-color: #e7e7e7;
border-radius: 3px;
padding: 0 4px;
font-weight: bold;
color: black;text-decoration: none;">F7515259: list.txt</a><br />
(using simple python script, which iterates over <tt style="background: #ebebeb; font-size: 13px;">QMimeDatabase().allMimeTypes()</tt>, checks if <tt style="background: #ebebeb; font-size: 13px;">type.inherits("text/plain")</tt> and is not already excluded by default Baloo config from <tt style="background: #ebebeb; font-size: 13px;">file/fileexcludefilters.cpp</tt>)</p>
<p>By looking at list, I see that some of them might be pretty heavy (and useless to index). For example, <tt style="background: #ebebeb; font-size: 13px;">application/x-valgrind-massif</tt>, or <tt style="background: #ebebeb; font-size: 13px;">application/sql</tt> (I know, SQL dumps are excluded by extension <tt style="background: #ebebeb; font-size: 13px;">*.sql</tt>, but someone might simply use another extension like <tt style="background: #ebebeb; font-size: 13px;">.dump</tt>). It's also pretty easy to imagine large Wolfram Mathematica file, i.e. containing pictures (that corresponds to <tt style="background: #ebebeb; font-size: 13px;">application/mathematica</tt> from the list; although on my computer those are detected as <tt style="background: #ebebeb; font-size: 13px;">application/vnd.wolfram.nb</tt>, which for some reason do not inherit <tt style="background: #ebebeb; font-size: 13px;">text/plain</tt>, although it's plaintext-based).</p>
<p>We can do our best to exclude undesired types, but I'm not sure we will be able to cover all of them. And some files might be of desirable type, but simply too large (RSS feeds <tt style="background: #ebebeb; font-size: 13px;">application/rss+xml</tt>, LyX files for some books <tt style="background: #ebebeb; font-size: 13px;">application/x-lyx</tt>, mailboxes <tt style="background: #ebebeb; font-size: 13px;">message/rfc822</tt> or <tt style="background: #ebebeb; font-size: 13px;">application/mbox</tt>).</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>and another example which:</p>
<ul class="remarkup-list">
<li class="remarkup-list-item">is currently skipped though it should be indexed</li>
<li class="remarkup-list-item">is indexed after this change</li>
</ul></blockquote>
<p>There shouldn't be any. I mean, "PlaintextExtractor" should be inside <tt style="background: #ebebeb; font-size: 13px;">exList</tt> for anything that starts with <tt style="background: #ebebeb; font-size: 13px;">text/</tt>...</p></div></div><br /><div><strong>REPOSITORY</strong><div><div>R293 Baloo</div></div></div><br /><div><strong>REVISION DETAIL</strong><div><a href="https://phabricator.kde.org/D23787">https://phabricator.kde.org/D23787</a></div></div><br /><div><strong>To: </strong>poboiko, Baloo, bruns, ngraham<br /><strong>Cc: </strong>broulik, kde-frameworks-devel, Baloo, lots0logs, LeGast00n, fbampaloukas, GB_2, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams<br /></div>