f



convert text images in pdf to text

Hi,

I have a pdf file where someone scanned in pages from a book and stored 
them as images in a pdf file which is 16 meg. Is there an easy way I can 
do some kind of OCR on this file to just get plain text from the images 
of text in the pdf?  Using a standard pdf to txt converter won't work 
because the file contains images of text as it appears in the book.

Thanks,
Ben

-- 
Cheap long distance calling using Onesuite (http://www.onesuite.com).
2.5 cents/min anywhere in the U.S., to Canada or the U.K. No monthly or
connection fees! Use promotional code 038664643 for 20 free minutes.
0
Ben
1/25/2004 5:02:02 AM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

3 Replies
1124 Views

Similar Articles

[PageSpeed] 10

Try Acrobat / Web Capture. This plug-in OCRize PDFs, you can choose the 
compression level, etc.
0
JR
1/25/2004 1:14:58 PM
Ben wrote:
> Hi,
> 
> I have a pdf file where someone scanned in pages from a book and stored 
> them as images in a pdf file which is 16 meg. Is there an easy way I can 
> do some kind of OCR on this file to just get plain text from the images 
> of text in the pdf?  Using a standard pdf to txt converter won't work 
> because the file contains images of text as it appears in the book.

I use Ghostscript in combination with jocr.  Ghostscript can convert the 
pdf into a pbm and then use jocr to ocr the pbm.  The result is a text 
file :)  Works like a champ.

www.ghostscript.com (free)
jocr.sourceforge.net (free)

James
0
James
1/27/2004 12:07:02 AM
Actually, you need Paper Capture.  If you have Acrobat 4 or 6, you
should be able to access it under the Document menu.  If you have
Acrobat 5, you may need to download it from the Adobe.com site (it
should be a free download).


JR Boulay <jrboulay@free.rf> wrote in message news:<20040125141434444+0100@nntpserver.tele2.fr>...
> Try Acrobat / Web Capture. This plug-in OCRize PDFs, you can choose the 
> compression level, etc.
0
burnsed
1/29/2004 6:04:44 PM
Reply: