OCR

vasa1 · Oct 3, 2011

This thread at the Ubuntu forum mentions tesseract.

Anyone here tried it? I'm bookmarking it for later in case the need arises.

linuxforall · Oct 3, 2011

Tesseract is good, I also use gscan2pdf for saving them to pdf.

Mrkvonic · Oct 3, 2011

I tried it, it's quite good. But you need to pay attention to the image format and such.
Mrk

iceni60 · Oct 6, 2011

there's a talk about cli OCRs below if you're interested. he used it and some others. don't tell me i need to get out more, it was raining lol
http://www.youtube.com/watch?v=oRR3j0J7jH0

vasa1 · Oct 7, 2011

Does Tesseract have to be "trained"?
I took a screenshot of some text as a .png file, converted it to a .tif file using Shotwell and then tried to get it back to text using Tesseract.

Original:

Code:

/home/aes/.config/google-chrome/Default/User StyleSheets/Custom.css
/home/aes/.mozilla/firefox/7sw6w9a2.default/permissions.sqlite
/home/aes/.mozilla/firefox/7sw6w9a2.default/SimpleBlock.ini
/home/aes/.mozilla/firefox/7sw6w9a2.default/stylish.sqlite
/home/aes/.themes/

After OCR:

Code:

/hnme/aes/.cnnfiq/9¤¤9lerchrnme/Default/llser Style5heets/Eustnm.css
/hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .default/Permissinns . sqlite
/hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/5i1npleBlnck. ini
/hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/stylish . sqlite
/hnme/aes/ .then»eS/

vasa1 · Oct 7, 2011

Looks like training is needed. Plus there's a minimum font size.
http://code.google.com/p/tesseract-ocr/wiki/FAQ

linuxforall · Oct 7, 2011

Yes quite a few scans before it gets it going, in this case Omnipage is better but Tesseract once trained does well.

Ocky · Oct 7, 2011

vasa1 said:

Does Tesseract have to be "trained"?
I took a screenshot of some text as a .png file, converted it to a .tif file using Shotwell and then tried to get it back to text using Tesseract.
Click to expand...

See also review by Mrk - lots of useful pointers. http://www.dedoimedo.com/computers/linux-ocr.html

Ocky · Oct 14, 2011

You can get a frontend GUI for tesseract called gimagereader. I think there are debs for Ubuntu.

Made a test - seems to work pretty well and if needs be you can edit any errors. No major errors were encountered with the below test i.e. nothing was edited.
You can increase the resolution and any errors magically disappear.

Log in or Sign up

OCR

vasa1 Registered Member

linuxforall Registered Member

Mrkvonic Linux Systems Expert

iceni60 ( ^o^)

vasa1 Registered Member

vasa1 Registered Member

linuxforall Registered Member

Ocky Registered Member

Ocky Registered Member

Log in or Sign up

OCR

vasa1 Registered Member

linuxforall Registered Member

Mrkvonic Linux Systems Expert

iceni60 ( ^o^)

vasa1 Registered Member

vasa1 Registered Member

linuxforall Registered Member

Ocky Registered Member

Ocky Registered Member

Useful Searches