[kde-linux] kooka scanning software

Bruce Bales bbales at cox.net
Thu Nov 22 17:00:22 UTC 2007


On Thursday 22 November 2007 hh:mm:ss Emanoil Kotsev wrote:
> hello, everybody.
> I want to share my  experience/opinion with/on gocr.
>
> I've tried this 2 years ago and still monitor the mailing list. Despite of
> the fact that there is very slow improvement, I was/am disappointed by the
> fact, that it only works with the latin character set. Above this it is not
> possible to add own code for other character sets without rewriting some
> parts of the code.
>
> 2 years ago the developers promised, they were planning to replace the
> algorithms with such using vectors, which would make extending of j/gocr
> (with plugins) easier, but as far as I know it has not been done yet.
>
> So I  think it is a waste of time to discuss on gocr... The C code is also
> very complicated to read ...
>
> shortly - disaster and a pity that there is no linux ocr program. I have
> not ever heard if something commercial works under linux.
>
> I was using OmniPage Pro ... and few weeks ago I tried this in a VMWare (it
> was working an year ago with a great success) . Now I have an error that
> says that the license period has expired and it is also not working. Well
> the program license was obtained in '97 .... but I don't remember to have
> read something about 10 year period ... any way commercial OCR costs and
> linux OCR sucks
>
> Let us hope that the feature will be better at least for the linux world.
>
> regards

Emanoil,
Linux does have a great OCR program -- it is called tesseract.  It works 
beautifully with my English scans.  I do not know if it works with other 
languages and fonts, but I think you should investigate it.

googling on "tesseract russian" brings up 166,000 hits. One of these
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
gives instruction for how to train tesseract for another language.
bruce



More information about the kde-linux mailing list