I'm new to the ocr/pdf world, as all the world will see from the naivite
of my question (sorry!)
I understand that searchable PDFs contain an image-only bit mapped image
layer and another "text" layer.
I also understand that Acrobat itself and a number of other third party
tools can scan an pre existing image-only pdf, doing OCR to create the
searchable pdf with the hiddern text layer.
But suppose that I have an existing OCR output file in some xml format
and no existing image-only PDF for the image. The OCR was scanned from
a pre-existing TIFF file.
Am I out of luck, nothing for it but to run the image through a tool
which BOTH converts to PDF AND does OCR (again!) to produce the
searchable pdf with the text layer OR is there some tool that can merge
an existing OCR output file in suitable format to an existing image-only
I can always get an image-only PDF by converting my TIFFimage to PDF
with some standard tool, so thats not the issue.
The trick is to avoid needing to re-OCR the image when I already have an
OCR output, but no PDF, not even an image-only one.
So. Can anybody sort me out? Are there tools for this or is the whole
concept just not right?
Thanx in advance.