f



OT: convert pdf to text

is there any good (and preferably free) tool for converting pdfs to plain 
text?
i know its OT, but i dont know where else to turn...

-- 
You're never too young to have a Vietnam flashback 


0
1/19/2007 8:44:59 PM
comp.text.tex 39029 articles. 3 followers. Post Follow

6 Replies
541 Views

Similar Articles

[PageSpeed] 15

filox schrieb:
> is there any good (and preferably free) tool for converting pdfs to plain 
> text?
> i know its OT, but i dont know where else to turn...
> 

Look at "pdftotext"

==> http://www.foolabs.com/xpdf/

....Rolf
0
1/19/2007 8:59:46 PM
filox wrote:
> is there any good (and preferably free) tool for converting pdfs to plain
> text?
> i know its OT, but i dont know where else to turn...
>
> --
> You're never too young to have a Vietnam flashback

Here's a freeware pdf reader that converts to text (among other
things).

http://www.pdf2exe.com/reader.html

I think Foxit pdf reader might be able to do this also.

-Dustin

0
1/20/2007 2:50:12 PM
On 20 Jan 2007 06:50:12 -0800, Dustin wrote:
 >
 > Here's a freeware pdf reader that converts to text (among other
 > things).
 >
 > http://www.pdf2exe.com/reader.html

Ugliest PDF rendering I've ever encountered (at least when run under
wine on Linux).  But I suppose that's irrelevant if one only wants
conversion to text.

Bob T.
0
BobT (1484)
1/20/2007 3:24:14 PM
>>>>> "Rolf" == Rolf Niepraschk <Rolf.Niepraschk@gmx.de> writes:

    Rolf> filox schrieb:

    >> is there any good (and preferably free) tool for converting
    >> pdfs to plain text?

    Rolf> Look at "pdftotext"

I use pdftotext (and pdftohtml) fairly often, for the purpose of
converting documents into a form I can read on my PDA.  My problem
with it is that it hardcodes all the line breaks.  I find even pretty
simple-minded fiction pretty hard to read if I can't tell the
difference between something that was a new line in the original pdf
and something that was a new paragraph.  Does anyone know a *smart*
pdf to text converter that makes that distinction?


-- 
Laura (mailto:lconrad@laymusic.org , http://www.laymusic.org/ )
(617) 661-8097	fax: (501) 641-5011
233 Broadway, Cambridge, MA 02139
0
lconrad (31)
1/21/2007 6:29:07 PM
Laura Conrad <lconrad@laymusic.org> writes:

>>>>>> "Rolf" == Rolf Niepraschk <Rolf.Niepraschk@gmx.de> writes:
>
>     Rolf> filox schrieb:
>
>     >> is there any good (and preferably free) tool for converting
>     >> pdfs to plain text?
>
>     Rolf> Look at "pdftotext"
>
> I use pdftotext (and pdftohtml) fairly often, for the purpose of
> converting documents into a form I can read on my PDA.  My problem
> with it is that it hardcodes all the line breaks.  I find even pretty
> simple-minded fiction pretty hard to read if I can't tell the
> difference between something that was a new line in the original pdf
> and something that was a new paragraph.  Does anyone know a *smart*
> pdf to text converter that makes that distinction?
>
>
> -- 

I do PDF to Text conversion with pdftotext quite a lot and also find the line
breaking an issue. What I normally do is run the output through some other
shell utilities like sed and fold. Essentially,

First, add an additional newline to any existing newline in the pdftotext
output. This gives you paragraph breaks.

Then, run the output through fold to create line breaks at word boundries near
a folding column i.e. column 80. 

I've also found that some pdf output includes a "number" at the end of some
lines. It seems this number indicates lines that were titles/subtitles etc. Not
all pdf to text output produces this - not sure which pdf files do, but I use a
small perl script that uses this information as a heuristic and I then put a
"*" (or maybe two or three, depending) to indicate sections. combining this
with emacs outloud mode, and I get quite nice folded output. 

Tim


-- 
tcross (at) rapttech dot com dot au
0
timx2 (502)
1/22/2007 9:46:56 PM
Laura Conrad wrote:
> I use pdftotext (and pdftohtml) fairly often, for the purpose of
> converting documents into a form I can read on my PDA.  My problem
> with it is that it hardcodes all the line breaks.  I find even pretty
> simple-minded fiction pretty hard to read if I can't tell the
> difference between something that was a new line in the original pdf
> and something that was a new paragraph.  Does anyone know a *smart*
> pdf to text converter that makes that distinction?

It's a bit old and is no longer being maintained but give PreScript a shot:

     http://www.nzdl.org/html/prescript.html

The program outputs blank lines (two newline characters) between paragraphs.

I believe that PreScript accepts only PostScript so you'll have to convert
from PDF to PostScript first (e.g., using pdftops or by printing to a file
from your PDF reader).

-- Scott
0
ctt (1173)
1/23/2007 3:23:57 AM
Reply:

Similar Artilces:

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

PDF Converter Pro
Anyone know how I can change the default font color and style for text boxes in PDF Converter Professional 3.0? Thanks. ...

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What about making "text + text" as an equivalent for "text > || text"? Most strongly-typed programming languages do this. And MS SQL > Server too, I think (CMIIW). What would this gain except for bloat? It's not like SQL is utterly compatible with any programming language; users will still have to learn all the operators anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Peter Eisentraut wrote: >>What is "text + text" supposed to do right now? > > Nothing. Then are these bugs? (7.4.5 and 8.0.0beta1 give same results). Frankly, the current behaviour is quite strange to me. ------------------ =...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iP...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iP...

Convert PDF image to PDF text
Hi, I have a PDF file with image text documents. hence I can't search the text. How can I convert it to searchable PDF file? Thanx You will have to scan it using an OCR (Optical Character Recognition) application and then create a PDF file. -- Don Vancouver, USA <WhiteLen@gmail.com> wrote in message news:1193737890.691267.165820@z9g2000hsf.googlegroups.com... > Hi, > > I have a PDF file with image text documents. hence I can't search the > text. How can I convert it to searchable PDF file? > > Thanx > Any recommendation for OCR application? ...

convert PDF images to PDF text (OCR)
Hi, I have a book which was entirely scanned into a PDF, there are approximately 300 pages. The problem is that each page is an image of the paper page, not selectable text. How can I convert this PDF made of images to a text-selectable PDF? Or directly convert it to Word? I suppose I'll need some form of OCR. Thanks in advance, Raphael Yes, since you have an image, you need to OCR it into data. You will not have much luck with any formating info, but most OCR programs today to a pretty good job if the image is fairly clean . Larry T. larrynospam@nospamjbmsystems.com wrote in message...

converting vertical text to horizontal text
I have file that contains text formated like horizontal H e B l i p P g e r m l P y a i b i n r s a i i t n n h e t i h s a e n m s r e w e l e a t r r i ! n g Does anyone have a good way to convert this to horizontal text like such below? Thanks. Big Pain in the rear! Help my brain is melting Perl is the answer I've search CPAN and found some Text modules to rotate the text, but I'm stuck trying to figure out how to read this text in. I'm thinking using unconstrainted 2D array. Thank you. -- I am not an Intel spokesperson...

Converting Text to Proper Text in SQL
Given a string it should convert it to a proper text. Example: if you passed a string 'Cat in the hat', I want 'Cat In The Hat' Curious about few things, Does sql have Instr OR Split(like VB) functionality Anybody can help?? (m.ramana@gmail.com) writes: > Given a string it should convert it to a proper text. > Example: if you passed a string 'Cat in the hat', I want 'Cat In The > Hat' I though "Cat in the Hat" was the proper title text in English, and "Cat In The Hat" is what you get when you use a computer? SQL Server is not...

PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to have it read for him. He has a PC. Found PDF reader $50 , http://thurly.net/11ia and http://thurly.net/11i4 the last being google. Wondering what you folks found useful or use ? Thanks! -- Bill S. Jersey USA zone 5 shade garden http://uppitywis.org/ live WI ...

Converting pdf to text
Hello all, Problem: Need to extract text information from a pdf file , write the text to a file for a hardware project . The text is contained in a table and has the width and height information of different layers for a chip The widthe and height information would be used to create test layouts for different layers using Cadence SKILL. OS: Hp-UX Other tools used: Cadence SKILL I wanted to do this initial pdf parsing in Perl because: - it comes with the OS - No point in writing the pdf parsing tool (which wld be an independen project then) - someone must have experienced the ...

Converting PDF to text
What would people recommend for converting PDF to Text that: a) can be purchased on CDROM (no downloading) b) is compatible with MacOS X 10.5.1 c) can do this to a document that is 400 pages long I realize the result will be ugly compared to PDF. I am hoping for something less ugly than reading the PDF file with a text editor. I have looked at the Adobe.com website and I do not know enough about their various offerings that have Acrobat in the name. Besides, you folks might be less biased. In article <R8UDlPb3bL2d@eisner.encompasserve.org>, Kilgallen@SpamCop.net (Larry Kilgal...

Convert PDF to text
Is there any way to extract the text from a PDF file? -- Dennis M. Marks Do not reply with e-mail to yahoo. I do not monitor mailbox. It is for collecting spam. You can use the following address (rot 13) qraznexf@qpfv.arg -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =----- On Sat, 11 Oct 2003 17:35:40 -0700, Dennis M. Marks (denmarks@yahoo.com) wrote: > Is there any way to extract the text from a PDF file? I'm sure there are plenty of way...

convert PDF to text
I have a large PDF document from a company that is out of business. I would like to copy 4 paragraphs from page 12 and paste it into an email instead of sending the whole 10MB PDF document. But it does not allow me to do this because "Content Copying or Extraction: Not Allowed" Any suggestions? luther wrote: > I have a large PDF document from a company that is out of business. > I would like to copy 4 paragraphs from page 12 and paste it into an > email instead of sending the whole 10MB PDF document. > > But it does not allow me to do this because "Content Cop...

need script: convert html-text to text
i have html-text. i have to convert this text to simple text without html-tags. -- Posted via http://www.ruby-forum.com/. keal wrote: > i have html-text. i have to convert this text to simple text without > html-tags. > > -- > Posted via http://www.ruby-forum.com/. path o'least resistance lynx -dump www.myurl or use links2 ## or w3m -dump www.myurl or high-falutin solution http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/e0fb1207f1814c77/37cd5e35a1ffb8d7?q=strip+HTML+tags&rnum=7#37cd5e35a1ffb8d7 On Wed, 04 Jan 2006 10:30:03 -0000, keal <keal2...

How to convert markup text to plain text in python?
I have some marked up text and would like to convert it to plain text, by simply removing all the tags. Of course I can do it from first principles but I felt that among all Python's markup tools there must be something that would do this simply, without having to create an XML parser etc. I've looked around a bit but failed to find anything, any tips? (e.g. convert "<B>Today</B> is <U>Friday</U>" to "Today is Friday") Regards, Geoff > I have some marked up text and would like to convert it to plain text, > by simply removing all th...

Has anyone tried PDF text to SVG text?
Has anyone tried PDF text to SVG text? http://www.pdftron.com/pdf2svg/index.html The pdfTron PDF2SVG converter enables users to publish PDF documents in SVG (Scalable Vector Graphics), the open-standard W3C recommendation for high-end graphics on the web. The flawless conversion process produces compact SVG documents that can be viewed using freely available SVG viewers and plugins. "Dr Joolz" <jxm96c@hotmail.com> wrote in message news:c1oo74$kkv$1$8302bc10@news.demon.co.uk... > Has anyone tried PDF text to SVG text? > > pdf2vector is available in desktop, ser...

Converting flowed text messages to plain text
Is it possible to "convert" the flowed text the Eudora messages are saved in the mailboxes as, to truly plain text? I've switched from Eudora to Pine as my e-mail client, and the html formatting in my archived messages is _really_ annoying. Gwen Gwen Morse wrote: >Is it possible to "convert" the flowed text the Eudora messages are >saved in the mailboxes as, to truly plain text? > >I've switched from Eudora to Pine as my e-mail client, and the html >formatting in my archived messages is _really_ annoying. > >Gwen > > Gwen, I don't believe there IS any HTML in the Format=Flowed messages. Html messages, when replied to, will have a line that looks similar to F=F. Plain text IS plain text. If there is HTML in a message, then F=F doesn't apply. That said, there IS a way to turn it off, but I'll have to try to find the commands. (If you really want to. You will break 'selective quoting' if you do that.) Rick -- Now that food has replaced sex in my life, I can't even get into my own pants. On Sat, 27 Dec 2003 20:49:38 -0500, Rick <Rick@privacy.net> wrote: >Gwen Morse wrote: > >>Is it possible to "convert" the flowed text the Eudora messages are >>saved in the mailboxes as, to truly plain text? >> >>I've switched from Eudora to Pine as my e-mail client, and the html >>formatting in my archived mes...

Scanned text (image) needs to be converted to text
Hello all, I have a PDF of a contract that was scanned in and stored as an image (so we can't select the text). Is there any way to convert an image to text (preferably built in to Adobe Acrobat or a free plug-in)? We are using Adobe Acrobat 5.0. Thanks for any help anyone can provide, Conan Kelly Conan Kelly wrote: > Hello all, > > I have a PDF of a contract that was scanned in and stored as an image > (so we can't select the text). > > Is there any way to convert an image to text (preferably built in to > Adobe Acrobat or a free plug-in)? > &...

Convert text files to pdf/a
I need to convert many text files, a couple of hundred, each to one pdf/a file with the same name as the text file, for archiving. OS = Win2000. We have Acrobar Professional 7.0 . Is it the best tool for the task ? I am new to this. I was thinking about a command line script and a "for" command to go thru the text file list. How can I ensure that the result will indeed be in pdf/a ? I need this ASAP. Anybody ? Yours Zmurek jflato@o2.pl wrote: > I need to convert many text files, a couple of hundred, each to one > pdf/a file with the same name as the text file, for archivi...

pdf to plain-text converter
hi, Is there any free pdf to plain-text converter available. If so please point to me. thanks in advance Sinbad On Thu, 16 Apr 2009 02:11:25 -0700, sinbad ci disse: > Is there any free pdf to plain-text converter available [...] *pdftotext* included in *pdfutils* package (from xpdf site) - http://www.foolabs.com/xpdf/ pdftotext version 3.01 Copyright 1996-2005 Glyph & Cog, LLC Usage: pdftotext [options] <PDF-file> [<text-file>] -f <int> : first page to convert -l <int> : last page to convert -layout : maintain original phy...

Converting A Unicode Text File to An ASCII Text File
We have a large number of text files built in Unicode that we need to be read by SAS. SAS does not read Unicode text. To convert the Unicode text to ASCII text we did the following: A Unicode text file adds an additional blank space to each character. To remove this blank character, you can use the code below. The hexidecimal value '00' is a blank character. So remove it using the SAS statement '00'x. The 'x' tells SAS you are reading hexidecimal. Use RECFM=N to tell SAS that you are reading a stream of data that will not conform to a typical file structure. SAS will ...

Web resources about - OT: convert pdf to text - comp.text.tex

File:Convert to SVG and move to Commons.svg - Wikipedia, the free encyclopedia
As a courtesy (but not a requirement), please e-mail me or leave a note on my talk page if you use this image outside of Wikipedia. Thanks! As ...

Facebook Co-Founder Dustin Moskovitz Sells 450K Shares, Converts 7M More
The great Facebook stock sale frenzy continues on with a report that Co-Founder Dustin Moskovitz unloaded 450,000 shares in the social network ...

Facebook Announces “App2User” Liquidity Program for Merchants to Convert Rewards Points into Credits
... for Credits, just announced a new Credits program called “App2User,” designed to enable merchants and loyalty program operators to convert their ...

How to Convert Leads Into Buying Customers - The 9 Step Sales Process EVERYONE Should Use
... School of Business in Toronto on the topics of Sales and Entrepreneurship. I shared with them the sales process that I find How to Convert Leads ...

Bitcoin : Should I convert my savings to Bitcoin?
Answer (1 of 7): You should think of this question as "Should I invest all my savings in Bitcoin?", and not "Should I have my savings denominated ...

Convertizo 2 - Convert Units and Currency in Style on the App Store on iTunes
Get Convertizo 2 - Convert Units and Currency in Style on the App Store. See screenshots and ratings, and read customer reviews.

Coca-Cola made packaging that converts into a free Google Cardboard-style VR viewer [Video]
... if the above experiment turns into reality. The project envisions packaging for a 12-pack of Cokes and other products that is easily converted ...

Coca-Cola made packaging that converts into a free Google Cardboard-style VR viewer for iPhone
... if the above experiment turns into reality. The project envisions packaging for a 12-pack of Cokes and other products that is easily converted ...

The Galaxy S7 Edge may have just made me a Samsung convert
Historically, I've had a begrudging respect for Samsung flagships. I can objectively acknowledge the company makes very good, extremely popular ...

This modular robot tank converts into a tracked hoverboard
Filed under: Etc. , Videos , Autonomous , Military , Off-Road The unmanned vehicle is equal parts hoverboard and tank.

Resources last updated: 3/11/2016 10:10:09 AM