Searchable PDF from pre-existing ocr output, with or without pre existing image-only PDF

I'm new to the ocr/pdf world, as all the world will see from the naivite
of my question (sorry!)

I understand that searchable PDFs contain an image-only bit mapped image 
layer and another "text" layer.

I also understand that Acrobat itself and a number of other third party 
tools can scan an pre existing image-only pdf, doing OCR to create the 
searchable pdf with the hiddern text layer.

But suppose that I have an existing OCR output file in some xml format 
and no existing image-only  PDF for the image. The OCR was scanned from
a pre-existing TIFF file.

Am I out of luck, nothing for it but to run the image through a tool 
which BOTH converts to PDF AND  does OCR (again!) to produce the 
searchable  pdf with the text layer OR is there some tool that can merge 
an existing OCR output file in suitable format to an existing image-only 

I can always get an image-only PDF by converting my TIFFimage to PDF 
with some standard tool, so thats not the issue.

The trick is to avoid needing to re-OCR the image when I already have an 
OCR output, but no PDF, not even an image-only one.

So. Can anybody sort me out? Are there tools for this or is the whole 
concept just not right?

Thanx in advance.

2/23/2006 10:01:13 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

0 Replies

Similar Articles

[PageSpeed] 43