D4995: Fix DB inconsistency due to some docterms appearing with uppercase symbols

Igor Poboiko noreply at phabricator.kde.org
Thu Mar 9 23:30:24 UTC 2017


poboiko created this revision.
poboiko added a project: Frameworks.

REVISION SUMMARY
  I've noted that on some PDF files, "balooshow -x file.pdf" segfaulted. Backtrace showed that it crashed due to having single "X" term (see line 201) <https://cgit.kde.org/baloo.git/tree/src/tools/balooshow/main.cpp#n201>. Moreover, it actually had a bunch of terms containing uppercase symbols (which should never occur, all the search terms are lowercase and uppercase is reserved for metadata).
  Further investigation showed that pdf file (after extraction) contained exotic unicode symbols (ex.: "𝐻𝑒𝑑𝑔𝑒"). After casting toLower(), that string remained the same; and after normalization it became "Hedge", and with that uppercase symbols it went right to DB.

TEST PLAN
  I've tested it on affected file; "balooshow -x" no longer crashes and no longer contains uppercase terms.
  
  Probably one can add additional check for "balooctl checkDb" command for that problematic case.
  I can prepare a separate patch, if necessary.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D4995

AFFECTED FILES
  src/engine/termgenerator.cpp

To: poboiko
Cc: #frameworks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20170309/1c392b5e/attachment-0001.html>


More information about the Kde-frameworks-devel mailing list