D29381: Thumbnail text: use libmagic to detect encoding

Harald Sitter noreply at phabricator.kde.org
Tue May 5 15:07:32 BST 2020


sitter added inline comments.

INLINE COMMENTS

> meven wrote in textcreator.cpp:38
> Without libmagic, it is current state basically UTF-8 with bom detection otherwise local codec.
> 
> I did not test exhaustive encodings so I wanted to let the door open for users to not rely on libmagic.
> libmagic works well from what I've tested but I could not be absolutely sure for the multiple encodings out there.
> Hopefully libmagic does a better job detecting UTF-8 (which I saw) but for users not using much UTF-8...
> 
> And libmagic loads a 5M file storing its heuristics each time it loads ( /usr/share/misc/magic.mgc ).
> It would be great to keep this in memory somewhere, maybe a static.

Perhaps it'd make sense to refactor this a bit and construct some test cases around encoding detection so we get a sense of reliablity?

The way I am looking at this: either libmagic always does the best job at detecting encodings, at which point we'll want it as a required dep, or there's something better in which case we don't want libmagic at all and instead use the something better ;)

In the end the user isn't necessarily in charge of what a random file will be encoded with, so I don't think there's a point in letting the user (or the distro) build an inferior product by accidentally not including libmagic. The truth is neither we nor the user can with any certainty say what encodings the thumbnailer will encounter.

REPOSITORY
  R320 KIO Extras

REVISION DETAIL
  https://phabricator.kde.org/D29381

To: meven, #frameworks, sitter, ngraham
Cc: pino, kde-frameworks-devel, kfm-devel, azyx, nikolaik, pberestov, iasensio, aprcela, fprice, LeGast00n, cblack, fbampaloukas, alexde, Codezela, feverfew, meven, michaelh, spoorun, navarromorales, firef, ngraham, andrebarros, bruns, emmanuelp, rdieter, mikesomov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20200505/ccc0841b/attachment.htm>


More information about the kfm-devel mailing list