Best way to touch up PDF files?

  • Follow


I have been scanning a bunch of paper documents into high
quality (600 dpi TIFF) files.  The next step is to convert them
into text+image PDF files.  The OCR is done by OmniPage and
ABBYY FineReader.  We all know that the OCR processing has
become very good, but it's not quite perfect yet, and the
resultant PDF files need to be touched up.  Using Acrobat
Professional I can select an object and click on "Edit" which
takes me into either Illustrator or Photoshop, and that's the
step in which I would like to see some (a lot! of) improvement.

Now the questions:

  - I don't have Adobe PageMaker; should I get it?  Will it
    do a better job (than Illustrator) when editing text?

  - In general, is it possible to have full access to every
    minute detail of an OCR-created (by FineReader) PDF file and
    touch it up until it _really_ looks like the original document?


In my experience, the only way to make a perfect clone of an
existent paper form is to forget the usual procedures:

      scanner -> OCR -> Acrobat PRO -> Illustrator or Photoshop

and do some low level programming instead.  I used PDFlib to write
a program which creates one single document.  It ended up being
better than the original.

Thanks for sharing your experience in this topic.

-Ramon F. Herrera
0
Reply ramon 5/20/2004 7:55:17 AM

ramon@conexus.net (Ramon F Herrera) wrote:

>I have been scanning a bunch of paper documents into high
>quality (600 dpi TIFF) files.  The next step is to convert them
>into text+image PDF files.  The OCR is done by OmniPage and
>ABBYY FineReader.  We all know that the OCR processing has
>become very good, but it's not quite perfect yet, and the
>resultant PDF files need to be touched up.  Using Acrobat
>Professional I can select an object and click on "Edit" which
>takes me into either Illustrator or Photoshop, and that's the
>step in which I would like to see some (a lot! of) improvement.
>
>Now the questions:
>
>  - I don't have Adobe PageMaker; should I get it?  Will it
>    do a better job (than Illustrator) when editing text?

No, it won't.

Why aren't you just using the text touch-up tool in Acrobat...?
----------------------------------------
Aandi Inston  quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.

0
Reply quite 5/20/2004 8:35:24 AM


ramon@conexus.net (Ramon F Herrera) wrote in 
news:c9bc36ff.0405192355.655a8bdd@posting.google.com:
> I have been scanning a bunch of paper documents into high
> quality (600 dpi TIFF) files.  The next step is to convert them
> into text+image PDF files.  

Why didn't you scan them directly into OmniPage?

>   - In general, is it possible to have full access to every
>     minute detail of an OCR-created (by FineReader) PDF file and
>     touch it up until it _really_ looks like the original document?

Why not? But PDF is not the most natural format for editing.

Using both products, you could output the files as Word documents, tweak 
them in Word anyway you like -- perhaps adding styles, links, etc. and 
after that generate a PDF (using Acrobat).

> I used PDFlib to write
> a program which creates one single document.  It ended up being
> better than the original.

In that case, it is not a perfect _clone_.


-- 
Matti Vuori, <http://sivut.koti.soon.fi/mvuori/index-e.htm>

0
Reply Matti 5/20/2004 9:21:11 AM

Matti Vuori <mvuori@koti.soon.fi> wrote in message news:<Xns94EF7DAC9B3B5mvuorikotisoonfi@193.229.0.31>...
> ramon@conexus.net (Ramon F Herrera) wrote in 
> news:c9bc36ff.0405192355.655a8bdd@posting.google.com:
> > I have been scanning a bunch of paper documents into high
> > quality (600 dpi TIFF) files.  The next step is to convert them
> > into text+image PDF files.  
> 
> Why didn't you scan them directly into OmniPage?
>

I only have access to the documents for a limited period of
time, since they are currently processed by hand and being used
all the time.  As a principle, I always keep a copy of the original
document in high quality (i.e. huge file) and from there I can always
derived thinner documents such as PDF.  The TIFF files have handwritten
stuff and "Received" stamps that I need to clean up with Photoshop
before I perform the OCR.  BTW: I have found that ABBYY FineReader
is definitely mmuch better than OmniPage.  This seems to be a case
in which Russian technology is way ahead of their American counterparts,
at least in the commercially available products.


> >   - In general, is it possible to have full access to every
> >     minute detail of an OCR-created (by FineReader) PDF file and
> >     touch it up until it _really_ looks like the original document?
> 
> Why not? But PDF is not the most natural format for editing.
>

Perhaps my problem is that I have to learn Illustrator better.
I keep on finding some defects in the texts (or lines) that I
can't figure out how to fix.  If I see a wrong character, I highlite
it and type a new one, expecting to replace only the character itself
but not its position.  Sometimes the replaced character goes to the
previous line and there is no way to place it in the correct spot.

> Using both products, you could output the files as Word documents, tweak 
> them in Word anyway you like -- perhaps adding styles, links, etc. and 
> after that generate a PDF (using Acrobat).
> 

I don't use Word for religious reasons.  Word is written by a company
which would be very happy if they could destroy the PDF format and
Adobe altogether, and that company has been shown in court to be
capable of doing illegal (can the word "criminal" be use to refer to
Microsoft? I wonder) acts to destroy their competitor.

The more pragmatic reason is that MS has never been able to figure out
how to do real WYSIWYG stuff.  In fact, no company in the world has
been able to do this.  Just Adobe, and therefore I use Illustrator and
Photoshop.  On the other hand, Adobe doesn't know that much about OCR,
that's why I go with the ABBYY folks.  You see, I always select the very
best tool for the task at hand.

> > I used PDFlib to write
> > a program which creates one single document.  It ended up being
> > better than the original.
> 
> In that case, it is not a perfect _clone_.

Well, granted, it's is not a perfect clone at the molecular level.

Is is a perfect clone of the original, ideal form.  The form had a
field with green dollar and cent signs in the background and the
user is required to write the amounts in black ink above the green
"$" signs.  The green was chosen in order to be invisible to the
scanner.  The forms that I had were not of the best quality and you
could barely see the dollar signs.

-Ramon
0
Reply ramon 5/20/2004 4:56:22 PM

quite@dial.pipex.con (Aandi Inston) wrote in message 
> 
> Why aren't you just using the text touch-up tool in Acrobat...?
> ----------------------------------------

The Acrobat TouchUp Text Tool has many limitations.  I need to
be able to change font sizes, colors, italicness, etc.
Sometimes the original form has a sticker which covers
part of the text and I would like to redo that whole
paragraph by hand.

Another annoying problem that I keep on finding: when I select
the TouchUp Object Tool and get into Illustrator it says that
"font such and such wasn't found, using the default font" which
destroys the look of my document.

-Ramon
0
Reply ramon 5/20/2004 5:23:57 PM

4 Replies
300 Views

(page loaded in 0.079 seconds)

Similiar Articles:













7/24/2012 5:24:42 PM


Reply: