f



read plain text from PDF

Hi all,
I'm a newby to iText, I've made some little program to generate PDF files
and all worked well.
Now my need is to read plain text from PDF files to make a PDF search
engine: but it seem, from tutorials and postings in this newsgroup, that is
impossible to get text written from a PDF files. Is this the definitive
answer?
In tutorial, chap 13, a phrase let me some hopes: " there are ways to
retrieve text from an existing PDF ". Which "ways"?

Thank you in advance.


0
Pier
4/26/2004 1:25:37 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

2 Replies
767 Views

Similar Articles

[PageSpeed] 5

"Pier" <plz@virgilio.it> wrote:

>Now my need is to read plain text from PDF files to make a PDF search
>engine: but it seem, from tutorials and postings in this newsgroup, that is
>impossible to get text written from a PDF files. Is this the definitive
>answer?

No, it's not impossible. It's just difficult. It is not stored as
plain text, but typically as compressed graphical commands.  If you
really want to, then the PDF Reference should be the next thing to
read.
----------------------------------------
Aandi Inston  quite@dial.pipex.com http://www.quite.com
Please support usenet! Post replies and follow-ups, don't e-mail them.

0
quite
4/26/2004 2:15:45 PM
Pier wrote:
> Hi all,
> I'm a newby to iText, I've made some little program to generate PDF files
> and all worked well.
> Now my need is to read plain text from PDF files to make a PDF search
> engine: but it seem, from tutorials and postings in this newsgroup, that is
> impossible to get text written from a PDF files. Is this the definitive
> answer?
> In tutorial, chap 13, a phrase let me some hopes: " there are ways to
> retrieve text from an existing PDF ". Which "ways"?
> 
> Thank you in advance.

I use pdftotext for dumping plain text from a PDF.  It is part of xpdf:

http://www.foolabs.com/xpdf/download.html

Sid Steward
http://www.AccessPDF.com/pdftk/
0
Sid
4/26/2004 4:32:35 PM
Reply: