D10455: Add RTL support for search, copy & paste in pdf

Chfan Zil noreply at phabricator.kde.org
Sun Feb 25 16:54:33 UTC 2018


chfanzil added a comment.


  I've tested this patch using 2 PDF files: One in Hebrew (it was downloaded using Wikipedia's Download-as-PDF option) the other in Arabic (it was supplied with the patch).
  F5729640: Open Source (Hebrew Wikipedia).pdf <https://phabricator.kde.org/F5729640>
  
  F5729641: arabic-search-test.pdf <https://phabricator.kde.org/F5729641>
  
  The results are as follows:
  
  1. Okular was able to find the text I was searching for (Success).
  
  2. But it is looking for the text in random areas. This is caused, probably, because Okular perceives each bulk of few words as located in different areas. And not as it should:
  
  First line: first word to the right, second word to the right, third...
  Second line: first word to the right, second word to the right, third...
  Third line...
  ..
  ..
  Last line: first word to the right, second word to the right, third..., last word to the right.
  
  I'm attaching a gif to illustrate this (I'm looking for the first word in the text, a 3-letter word 'קוד' (which translates to code):
  F5729642: Search RTL text in patched okular D10455 - 1.gif <https://phabricator.kde.org/F5729642>
  
  3. Okular is unable to copy RTL text correctly. The order of the words gets mixed. This can be seen when trying to select the text - the way in which the text gets selected isn't intuitive, and can also be seen when pasting the selected text - in comparison to the original text. Here is a gif to illustrate this:
  
  F5729644: Copy RTL text in patched okular D10455.gif <https://phabricator.kde.org/F5729644>
  
  4. The same problems occur when testing with the PDF in Arabic. Selecting the text is messy and searching for a letter in the text jumps between lines up and down.
  
  Here is a gif to illustrate this (I'm looking for a common letter from the Arabic alphabet, the letter 'ا' (Alif, the first letter in the Arabic alphabet)):
  F5729646: Search RTL text in patched okular D10455 - 2.gif <https://phabricator.kde.org/F5729646>
  
  To conclude:
  In regards to searching:
  **unpatched okular**: You need to type the text in reverse in order to search it.
  **okular + D10455 <https://phabricator.kde.org/D10455>**: You can partially search by typing the text in the correct way, but the results will be scattered and not in the intuitive order of reading.
  **okualr + D10298 <https://phabricator.kde.org/D10298>**: You can partially search by typing the text in the correct way, but the results will come up for each line, in the opposite order, but at least it will jump from the first line, to the second, to the third, and not sporadically like D10455 <https://phabricator.kde.org/D10455>.
  
  In regards to copying text:
  **unpatched okular**: Will copy the text in reverse (not just the order of the words, but also the order of the letters in a word), so ABC DEFG HIJK will paste as KJIH GFED CBA.
  **okular + D10455 <https://phabricator.kde.org/D10455>**: Will copy parts of the text in the correct way. 
  So for example: ABC DEFG HIJK LMNO PQR STUVW XYZ 
  Will be pasted as: HIJK LMNO ABC PQR STUVW DEFG XYZ
  **okualr + D10298 <https://phabricator.kde.org/D10298>**: Like unpatched okular
  
  In terms of usability, the ability to search is important, so D10298 <https://phabricator.kde.org/D10298> is better than D10455 <https://phabricator.kde.org/D10455> (as it enables one to search in each line from the end of the line to the beginning of the line, and not sporadically.
  In regards to copying text, they're both not so good.

REPOSITORY
  R223 Okular

REVISION DETAIL
  https://phabricator.kde.org/D10455

To: fahadalsaidi, #okular, aacid, ltoscano
Cc: chfanzil, ngraham, michaelweghorn, aacid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/okular-devel/attachments/20180225/d5047be5/attachment-0001.html>


More information about the Okular-devel mailing list