On Apr 16, 2:55=A0pm, Lutrin <elic...@olympo.it> wrote:
> On Thu, 16 Apr 2009 02:11:25 -0700, sinbad ci disse:
> > Is there any free pdf to plain-text converter available
> included in *pdfutils* package (from xpdf site)
Yes, pdftotext from the Xpdf package is quite good. I've used it in a
project for a client. Worked fairly well. I used in on Windows with
MS VC++, but it works on Linux and UNIX as well (might need to build
from source and do some tweaking for less mainstream UNIX flavors).
Glyph and Cog, the company behind Xpdf, has been around for a while,
and they gave me good technical support when I was using Xpdf in that
project, including quickly fixing a bug that I found while using it,
and releasing a new version with the bug fix.
Also, just saw about this: TET from PDFLib:
I've not tried TET yet, but it may also be good, since its from the
same company, PDFlib GmbH, as the PDFLIb product - both that company
and that product have been around for many years, too. I've tried out
PDFLib and it's good. Also, it has language bindings for many
programming languages, apart from C or C++, in which it is written.
A point to note: from working on PDF text extraction, I got to know
that it may not be possible to extract text with 100% accuracy from
certain PDF files. This is supposed to be due to the nature of the PDF
format itself (for reasons too complicated to go into right now), not
necessarily due to a flaw in the text extraction tool (although of
course, inaccurate results can also be due to a specific tool in some
cases.) If you're extracting text from such PDFs for any critical
application, consider this.
Dancing Bison Enterprises
Software consulting and training
xtopdf: fast and easy PDF creation from other file formats:
Blog (on software innovation): http://jugad2.blogspot.com