This thread at the Ubuntu forum mentions tesseract. Anyone here tried it? I'm bookmarking it for later in case the need arises.
there's a talk about cli OCRs below if you're interested. he used it and some others. don't tell me i need to get out more, it was raining lol http://www.youtube.com/watch?v=oRR3j0J7jH0
Does Tesseract have to be "trained"? I took a screenshot of some text as a .png file, converted it to a .tif file using Shotwell and then tried to get it back to text using Tesseract. Original: Code: /home/aes/.config/google-chrome/Default/User StyleSheets/Custom.css /home/aes/.mozilla/firefox/7sw6w9a2.default/permissions.sqlite /home/aes/.mozilla/firefox/7sw6w9a2.default/SimpleBlock.ini /home/aes/.mozilla/firefox/7sw6w9a2.default/stylish.sqlite /home/aes/.themes/ After OCR: Code: /hnme/aes/.cnnfiq/9¤¤9lerchrnme/Default/llser Style5heets/Eustnm.css /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .default/Permissinns . sqlite /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/5i1npleBlnck. ini /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/stylish . sqlite /hnme/aes/ .then»eS/
Looks like training is needed. Plus there's a minimum font size. http://code.google.com/p/tesseract-ocr/wiki/FAQ
Yes quite a few scans before it gets it going, in this case Omnipage is better but Tesseract once trained does well.
You can get a frontend GUI for tesseract called gimagereader. I think there are debs for Ubuntu. Made a test - seems to work pretty well and if needs be you can edit any errors. No major errors were encountered with the below test i.e. nothing was edited. You can increase the resolution and any errors magically disappear.