D4995: Fix DB inconsistency due to some docterms appearing with uppercase symbols
Igor Poboiko
noreply at phabricator.kde.org
Thu Mar 9 23:30:24 UTC 2017
poboiko created this revision.
poboiko added a project: Frameworks.
REVISION SUMMARY
I've noted that on some PDF files, "balooshow -x file.pdf" segfaulted. Backtrace showed that it crashed due to having single "X" term (see line 201) <https://cgit.kde.org/baloo.git/tree/src/tools/balooshow/main.cpp#n201>. Moreover, it actually had a bunch of terms containing uppercase symbols (which should never occur, all the search terms are lowercase and uppercase is reserved for metadata).
Further investigation showed that pdf file (after extraction) contained exotic unicode symbols (ex.: "𝐻𝑒𝑑𝑔𝑒"). After casting toLower(), that string remained the same; and after normalization it became "Hedge", and with that uppercase symbols it went right to DB.
TEST PLAN
I've tested it on affected file; "balooshow -x" no longer crashes and no longer contains uppercase terms.
Probably one can add additional check for "balooctl checkDb" command for that problematic case.
I can prepare a separate patch, if necessary.
REPOSITORY
R293 Baloo
REVISION DETAIL
https://phabricator.kde.org/D4995
AFFECTED FILES
src/engine/termgenerator.cpp
To: poboiko
Cc: #frameworks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20170309/1c392b5e/attachment-0001.html>
More information about the Kde-frameworks-devel
mailing list