OCR

Discussion in 'all things UNIX' started by vasa1, Oct 3, 2011.

Thread Status:
Not open for further replies.
  1. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    This thread at the Ubuntu forum mentions tesseract.

    Anyone here tried it? I'm bookmarking it for later in case the need arises.
     
  2. linuxforall

    linuxforall Registered Member

    Joined:
    Feb 6, 2010
    Posts:
    2,136
    Tesseract is good, I also use gscan2pdf for saving them to pdf.
     
  3. Mrkvonic

    Mrkvonic Linux Systems Expert

    Joined:
    May 9, 2005
    Posts:
    8,698
    I tried it, it's quite good. But you need to pay attention to the image format and such.
    Mrk
     
  4. iceni60

    iceni60 ( ^o^)

    Joined:
    Jun 29, 2004
    Posts:
    5,116
  5. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
    Does Tesseract have to be "trained"?
    I took a screenshot of some text as a .png file, converted it to a .tif file using Shotwell and then tried to get it back to text using Tesseract.

    Original:
    Code:
    /home/aes/.config/google-chrome/Default/User StyleSheets/Custom.css
    /home/aes/.mozilla/firefox/7sw6w9a2.default/permissions.sqlite
    /home/aes/.mozilla/firefox/7sw6w9a2.default/SimpleBlock.ini
    /home/aes/.mozilla/firefox/7sw6w9a2.default/stylish.sqlite
    /home/aes/.themes/
    
    After OCR:
    Code:
    /hnme/aes/.cnnfiq/9¤¤9lerchrnme/Default/llser Style5heets/Eustnm.css
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .default/Permissinns . sqlite
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/5i1npleBlnck. ini
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/stylish . sqlite
    /hnme/aes/ .then»eS/
    
     
  6. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,152
  7. linuxforall

    linuxforall Registered Member

    Joined:
    Feb 6, 2010
    Posts:
    2,136
    Yes quite a few scans before it gets it going, in this case Omnipage is better but Tesseract once trained does well.
     
  8. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,677
    Location:
    George, S.Africa
    See also review by Mrk - lots of useful pointers. http://www.dedoimedo.com/computers/linux-ocr.html
     
  9. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,677
    Location:
    George, S.Africa
    You can get a frontend GUI for tesseract called gimagereader. I think there are debs for Ubuntu.

    Made a test - seems to work pretty well and if needs be you can edit any errors. No major errors were encountered with the below test i.e. nothing was edited. :)
    You can increase the resolution and any errors magically disappear. :D

    one.tif - gImageReader.png
     
Thread Status:
Not open for further replies.