D23787: [baloo_file_extractor] Improve handling of large plain-text files

Stefan BrĂ¼ns noreply at phabricator.kde.org
Wed Nov 13 12:32:33 GMT 2019


bruns added a comment.


  In D23787#541963 <https://phabricator.kde.org/D23787#541963>, @poboiko wrote:
  
  > In D23787#537891 <https://phabricator.kde.org/D23787#537891>, @bruns wrote:
  >
  > > Can you please provide an example which:
  > >
  > > - is currently indexed though it should be skipped due to size
  > > - is skipped after this change
  >
  >
  > Sure. Any mimetype inherited from "text/plain", but starting with "text/" counts. I've made an actual list:
  >  F7515259: list.txt <https://phabricator.kde.org/F7515259>
  >  (using simple python script, which iterates over `QMimeDatabase().allMimeTypes()`, checks if `type.inherits("text/plain")` and is not already excluded by default Baloo config from `file/fileexcludefilters.cpp`)
  
  
  Your script is wrong. E.g. SVG inherits from text/plain, but has its own extractor, thus is not fed to the PlaintextExtractor. Dito for anything inheriting from XML.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23787

To: poboiko, #baloo, bruns, ngraham
Cc: davidedmundson, broulik, kde-frameworks-devel, #baloo, hurikhan77, lots0logs, LeGast00n, fbampaloukas, GB_2, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20191113/f0fcb5f1/attachment.html>


More information about the Kde-frameworks-devel mailing list