[Okular-devel] OCR Tool for Okular

Anıl Özbek ozbekanil at gmail.com
Wed Apr 3 11:20:35 UTC 2013


Hi,

Last week, I've started to write a simple OCR tool for Okular.
Generally it received good response from KDE users [1-3].

What do you think about adding such a tool to Okular? Is it possible?
If possible, I'd be happy to help as far as I can do. But I would like
to say that I'm not experienced in the KDE/Qt development.

Currently my code (which mostly copy/paste from other projects) take
an image part from active document and save it to os's temp dir. Then
run a particular OCR app's executable file (for now only Tesseract)
and convert image to text file. Finally code open the text file and
copy its content to clipboard. And after all, the temporary files are
deleted.

I think before going any further it would be better to clarify some
issues that I encountered.


API vs Executable
-------------------
Which one would be better to use? It's easier to use the executable
file. But using API seems a more right approach. As far as I see
Tesseract [4] and Cuneiform [5] provide API but I don't know about
other OCR software.

Maybe instead of trying to give support to more than one OCR software
we can choose just a default one. But it will restrict the users.

If we use API, Okular will link to OCR software libraries and this
means more dependencies for Okular package. If we use executable, we
can check executable file before running it and if it's not installed
we can show a info message to user which tells something like that:
"additional packages must be installed to use this feature".

If we choose API way these [6-9] way help.


OCR Output's Accuracy
-----------------------
OCR performance isn't well enough (at least for comics) for now. There
is almost 50% success. My current code use image directly from comics,
may be it would be nice to convert image first black and white or
2-bit and apply some other image operations to make letters clearer.
Do you have any suggestions about this?


Icon for OCR Tool
-------------------
Currently I used scanner icon from Oxygen [10] but if we have a better
option we can use it.


Document Language
-------------------
To give OCR software correct parameters we must know document
language. For now Okular can't determine language of opened documents
[11]. Until this feature implemented we can add a new section to
Okular Configurations for OCR tool. Users can select language for OCR
process from here as well as which OCR software will be used.


Links
-------
[1] http://wklej.org/id/995982/
[2] http://www.youtube.com/watch?v=duSTyByIPLc
[3] https://plus.google.com/113435503145887565355/posts/RqzC3hMcGcd
[4] https://code.google.com/p/tesseract-ocr/
[5] https://launchpad.net/cuneiform-linux
[6] https://raw.github.com/ruediger/VobSub2SRT/master/CMakeModules/FindTesseract.cmake
[7] https://raw.github.com/ck1125/sikuli/master/cmake_modules/FindTesseract.cmake
[8] https://projects.kde.org/projects/playground/libs/kolena/repository/revisions/master/entry/cmake/modules/FindTesseract.cmake
[9] https://raw.github.com/uliss/quneiform_tests/master/cmake/FindCuneiform.cmake
[10] http://i.imgur.com/xn8iyDw.png
[11] https://bugs.kde.org/show_bug.cgi?id=317486


Regards,
--
Anıl Özbek


More information about the Okular-devel mailing list