OCR

Discussion in 'all things UNIX' started by vasa1, Oct 3, 2011.

Thread Status:
Not open for further replies.
  1. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,238
    This thread at the Ubuntu forum mentions tesseract.

    Anyone here tried it? I'm bookmarking it for later in case the need arises.
     
  2. linuxforall

    linuxforall Registered Member

    Joined:
    Feb 6, 2010
    Posts:
    2,137
    Tesseract is good, I also use gscan2pdf for saving them to pdf.
     
  3. Mrkvonic

    Mrkvonic Linux Systems Expert

    Joined:
    May 9, 2005
    Posts:
    9,322
    I tried it, it's quite good. But you need to pay attention to the image format and such.
    Mrk
     
  4. iceni60

    iceni60 ( ^o^)

    Joined:
    Jun 29, 2004
    Posts:
    5,116
  5. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,238
    Does Tesseract have to be "trained"?
    I took a screenshot of some text as a .png file, converted it to a .tif file using Shotwell and then tried to get it back to text using Tesseract.

    Original:
    Code:
    /home/aes/.config/google-chrome/Default/User StyleSheets/Custom.css
    /home/aes/.mozilla/firefox/7sw6w9a2.default/permissions.sqlite
    /home/aes/.mozilla/firefox/7sw6w9a2.default/SimpleBlock.ini
    /home/aes/.mozilla/firefox/7sw6w9a2.default/stylish.sqlite
    /home/aes/.themes/
    
    After OCR:
    Code:
    /hnme/aes/.cnnfiq/9¤¤9lerchrnme/Default/llser Style5heets/Eustnm.css
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .default/Permissinns . sqlite
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/5i1npleBlnck. ini
    /hnme/aes/ .mnzilla/firefnx/7sw6u9a2 .defanlt/stylish . sqlite
    /hnme/aes/ .then»eS/
    
     
  6. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,238
  7. linuxforall

    linuxforall Registered Member

    Joined:
    Feb 6, 2010
    Posts:
    2,137
    Yes quite a few scans before it gets it going, in this case Omnipage is better but Tesseract once trained does well.
     
  8. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,710
    Location:
    George, S.Africa
    See also review by Mrk - lots of useful pointers. http://www.dedoimedo.com/computers/linux-ocr.html
     
  9. Ocky

    Ocky Registered Member

    Joined:
    May 6, 2006
    Posts:
    2,710
    Location:
    George, S.Africa
    You can get a frontend GUI for tesseract called gimagereader. I think there are debs for Ubuntu.

    Made a test - seems to work pretty well and if needs be you can edit any errors. No major errors were encountered with the below test i.e. nothing was edited. :)
    You can increase the resolution and any errors magically disappear. :D

    one.tif - gImageReader.png
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.