D23787: [baloo_file_extractor] Improve handling of large plain-text files

Igor Poboiko noreply at phabricator.kde.org
Wed Nov 13 13:08:06 GMT 2019


poboiko added a comment.


  @bruns: I've missed D16593: [ExtractorCollection] Use only best matching extractor plugin <https://phabricator.kde.org/D16593>, and had in mind previous situation where we've matched all extractors based on inheritance. In that case, "Secondly" part indeed does not seem to apply anymore.
  (as for my previous answer: I misunderstood you, thought you were asking about the case where `PlainTextExtractor` did not match & matched afterwards)
  
  > Your script is wrong. E.g. SVG inherits from text/plain, but has its own extractor, thus is not fed to the PlaintextExtractor. Dito for anything inheriting from XML.
  
  I'm not claiming the list to be comprehensive, it's just a first approximation.
  I'm claiming just that there is plethora of plain-text-based types (and might be even more in the future), some of which **in principle** might cause an issue.
  
  There were plenty of situations in the past when users first encountered Baloo choking on some files (see git log of `fileexcludefilters.cpp` - SQL dumps, genome data, etc.), which made Baloo unusable for them.
  Luckily for us, they reported it, and we blacklisted it. But I think it's unlikely we will manage to cover all the problematic cases that way (not all users report issues, and we're not familiar with all possible mimetypes).
  This patch should serve as a preventive measure, reducing the probabilty of Baloo choking on it in the first place.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23787

To: poboiko, #baloo, bruns, ngraham
Cc: davidedmundson, broulik, kde-frameworks-devel, #baloo, hurikhan77, lots0logs, LeGast00n, fbampaloukas, GB_2, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20191113/d69f8211/attachment.html>


More information about the Kde-frameworks-devel mailing list