f



pdf to plain-text converter

hi,
Is there any free pdf to plain-text converter available.
If so please point to me.

thanks in advance
Sinbad
0
sinbad
4/16/2009 9:11:25 AM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

2 Replies
4703 Views

Similar Articles

[PageSpeed] 55

On Thu, 16 Apr 2009 02:11:25 -0700, sinbad ci disse:

> Is there any free pdf to plain-text converter available
[...]

*pdftotext*

included in *pdfutils* package (from xpdf site)

- http://www.foolabs.com/xpdf/

pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -layout           : maintain original physical layout
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta 
information
  -enc <string>     : output text encoding name
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -nopgbrk          : don't insert page breaks between pages
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information
-- 
Puppy Linux wiki:  http://puppylover.netsons.org/dokupuppy
Puppy Linux Forum: http://puppylinux.ilbello.com
Windows me genuit, Ubuntu rapuere / tenet nunc Puppy Linux...
0
Lutrin
4/16/2009 9:55:36 AM
On Apr 16, 2:55=A0pm, Lutrin <elic...@olympo.it> wrote:
> On Thu, 16 Apr 2009 02:11:25 -0700, sinbad ci disse:
>
> > Is there any free pdf to plain-text converter available
>
> [...]
>
> *pdftotext*
>
> included in *pdfutils* package (from xpdf site)
>
> -http://www.foolabs.com/xpdf/
>

Yes, pdftotext from the Xpdf package is quite good. I've used it in a
project for a client.  Worked fairly well. I used in on Windows with
MS VC++, but it works on Linux and UNIX as well (might need to build
from source and do some tweaking for less mainstream UNIX flavors).
Glyph and Cog, the company behind Xpdf, has been around for a while,
and they gave me good technical support when I was using Xpdf in that
project, including quickly fixing a bug that I found while using it,
and releasing a new version with the bug fix.

Also, just saw about this: TET from PDFLib:

http://www.pdflib.com/products/tet/

I've not tried TET yet, but it may also be good, since its from the
same company, PDFlib GmbH, as the PDFLIb product - both that company
and that product have been around for many years, too. I've tried out
PDFLib and it's good. Also, it has language bindings for many
programming languages, apart from C or C++, in which it is written.

A point to note: from working on PDF text extraction, I got to know
that it may not be possible to extract text with 100% accuracy from
certain PDF files. This is supposed to be due to the nature of the PDF
format itself (for reasons too complicated to go into right now), not
necessarily due to a flaw in the text extraction tool (although of
course, inaccurate results can also be due to a specific tool in some
cases.) If you're extracting text from such PDFs for any critical
application, consider this.

- Vasudev
---
Vasudev Ram
Dancing Bison Enterprises
Software consulting and training
Biz: http://www.dancingbison.com
xtopdf: fast and easy PDF creation from other file formats:
http://www.dancingbison.com/products.html
Blog (on software innovation): http://jugad2.blogspot.com
0
vasudevram
4/20/2009 12:59:04 AM
Reply: