View Full Version : How to make a "searchable" PDF (text images)
Jean Marc
September 19th, 2008, 05:42 AM
Hi,
Do you know a tool that can create a "searchable" PDF from scanned PDF images?
Tia for your help ;)
P.S : I use Foxit Reader 2.3
GlobalForce
September 19th, 2008, 07:10 AM
Hi Jean Marc,
I presume you're after OCR (http://www.investintech.com/resources/articles/pdftypes/) soft. Unless Bob D. or someone else makes a suggestion, you'll find more on Wikipedia.
If there's a copy of MS Office onboard you can probably utilize it's (06 article now) document imaging (http://weblogs.asp.net/jgalloway/archive/2006/10/01/Free-OCR-software_3F00_-You-may-already-have-it_2E002E002E00_.aspx) feature.
S
raakii
September 19th, 2008, 09:56 AM
Abbyy Fine reader and Scansoft omnipage are the ones leading int the arena
Bob D
September 19th, 2008, 10:07 AM
Hi Jean Marc
As you know, "true" pdf s are text searchable.
However, when you scan a pdf document, it typically turns it into a raster image (vs. a vector image, as created by Acrobat), subsequently losing it's native pdf "intelligence".
There are 3 types of pdf s (described briefly here):
http://www.pdftocad.com/
Other than GlobalForce's OCR soft recommendation, I cannot offer any suggestions.
Cheers
Jean Marc
September 20th, 2008, 06:19 AM
Thank you all for your advice ;)
Apart from PDF made from scanned images, I've got a "weird" PDF...
At first glance, it seems to be a "true" PDF: for instance, I can highlight and use the copy command (the document is not encrypted) but when I try to put the copied text in the clipboard, the characters are unreadable!
http://img519.imageshack.us/img519/8576/img3261909jp8.jpg
And above all, I can't perform a search... ???
Any solution?
Tia
GlobalForce
September 20th, 2008, 06:37 AM
Give this (http://www.jsware.net/jsware/pdfconv.php5) a shot if it's a commonly available file. If from a friend, ask how it was generated. In the event you still can't get it working search "making fair use of cut and paste restricted PDF files" and browse within the top ten result's .... it may well be DRM related.
S
Bob D
September 20th, 2008, 10:19 AM
-{ Quote: "...when I try to put the copied text in the clipboard, the characters are unreadable!" }-
Not familiar with Foxit, but did you copy the characters as Plain Text, Formatted Text, or Rich Content?
Again, the pdf may not be a "true" pdf, but a hybrid.
If the file contains nothing proprietary or personal, maybe you can post it so that members can possibly manipulate it.
vBulletin® Copyright ©2000-2012, Jelsoft Enterprises Ltd.
Copyright ©2002 - 2012, Wilders Security Forums