D23787: [baloo_file_extractor] Improve handling of large plain-text files
Stefan BrĂ¼ns
noreply at phabricator.kde.org
Wed Nov 13 12:32:33 GMT 2019
bruns added a comment.
In D23787#541963 <https://phabricator.kde.org/D23787#541963>, @poboiko wrote:
> In D23787#537891 <https://phabricator.kde.org/D23787#537891>, @bruns wrote:
>
> > Can you please provide an example which:
> >
> > - is currently indexed though it should be skipped due to size
> > - is skipped after this change
>
>
> Sure. Any mimetype inherited from "text/plain", but starting with "text/" counts. I've made an actual list:
> F7515259: list.txt <https://phabricator.kde.org/F7515259>
> (using simple python script, which iterates over `QMimeDatabase().allMimeTypes()`, checks if `type.inherits("text/plain")` and is not already excluded by default Baloo config from `file/fileexcludefilters.cpp`)
Your script is wrong. E.g. SVG inherits from text/plain, but has its own extractor, thus is not fed to the PlaintextExtractor. Dito for anything inheriting from XML.
REPOSITORY
R293 Baloo
REVISION DETAIL
https://phabricator.kde.org/D23787
To: poboiko, #baloo, bruns, ngraham
Cc: davidedmundson, broulik, kde-frameworks-devel, #baloo, hurikhan77, lots0logs, LeGast00n, fbampaloukas, GB_2, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20191113/f0fcb5f1/attachment.html>
More information about the Kde-frameworks-devel
mailing list