f



pdf to text

I am looking for a way to convert PDF files into text content. I don't 
care about layout or formatting, just the plain text that I can use to 
search against in a database.

I've look into the pdftotext tool from:
http://www.foolabs.com/xpdf/download.html

However, when I use it via the command line, it works fine. If I issue 
the same command via a system() call, there are major problems that 
cause the server to crash. (Don't know why, there aren't any error 
messages been generated anywhere.)

I am looking to use this when a PDF file is uploaded via a form and 
store the text in a database for a search function.

TIA

-- Justin
0
justin4335 (310)
3/8/2006 5:07:57 PM
comp.lang.php 32646 articles. 0 followers. Post Follow

3 Replies
459 Views

Similar Articles

[PageSpeed] 22

Justin Koivisto wrote:
>
> I am looking for a way to convert PDF files into text content.

I vaguely remember using Ghostscript for that... 

Cheers, 
NC

0
nc (1051)
3/8/2006 5:50:06 PM
I have not used pdftotext via system(). However you could try different
versions of pdftotext. In my experience the version you use can have
quite different effects. Different versions should be easily available.
It's also possible to use ascii2txt, which depends on Ghostscript I
think. When I tried it I got into a muddle of versions though, and
pdftotext was much easier.

0
3/8/2006 6:47:06 PM
Justin Koivisto <justin@koivi.com> wrote:
> I am looking for a way to convert PDF files into text content. I don't 
> care about layout or formatting, just the plain text that I can use to 
> search against in a database.
> 
> I've look into the pdftotext tool from:
> http://www.foolabs.com/xpdf/download.html
> 
> However, when I use it via the command line, it works fine. If I issue 
> the same command via a system() call, there are major problems that 
> cause the server to crash. (Don't know why, there aren't any error 
> messages been generated anywhere.)

What do you mean when you say the server crashes? The Apache process 
dies? The entire machine locks up? The server physically falls off the 
rack and lands on the floor?

How about doing an experiment where you use system() to call a shell 
script that sets up some debugging and dumps the environment, and see 
what you come up with?

miguel
-- 
Photos from 38 countries on 5 continents: http://travel.u.nu
Latest photos: Australia; Malaysia; Burma; Thailand; Hong Kong
Airports of the world: http://airport.u.nu
0
Miguel
3/16/2006 6:10:12 AM
Reply:

Similar Artilces:

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to have it read for him. He has a PC. Found PDF reader $50 , http://thurly.net/11ia and http://thurly.net/11i4 the last being google. Wondering what you folks found useful or use ? Thanks! -- Bill S. Jersey USA zone 5 shade garden http://uppitywis.org/ live WI ...

PDF::API2
Hello All, I am new to PDF files so I don't really know if what I want to do is possible and how to use the PDF::API2 modules. I need to extract information from columns in a table ( I assume that PDF does not know anything about tables). What I was thinking of doing was finding the horizontal location of the header (I know what it should be), then extract all text that starts at that location. I have played around with the PDF::API2 module and read the 'Using PDF::API2 - The code' help page, however it doesn't show me how to extract information from an existing file. ...

Failed opening required 'PEAR.php' (include_path='F:\www\include') in F:\Program Files\PHP\PEAR\Text\CAPTCHA.php on line 22
Hi, Guys=EF=BC=8Cwhen I run my site ,I got some errors: Warning: require_once(PEAR.php) [function.require-once]: failed to open stream: No such file or directory in F:\Program Files\PHP\PEAR \Text\CAPTCHA.php on line 22 Fatal error: require_once() [function.require]: Failed opening required 'PEAR.php' (include_path=3D'F:\www\include') in F:\Program Files \PHP\PEAR\Text\CAPTCHA.php on line 22 It appears that it cann't find the pear.php ,but i checked my dir,and this file was there,and also I have my php.ini file checked,the include_path=3Dinclude_path=3D".;F:\Program Files\PHP\pear;F:\www \include" ,it looks all right,was there anything I have missed in the config file?Or something wrong ? Wish somebody can help me out. Thanks, Mikay >Warning: require_once(PEAR.php) [function.require-once]: failed to >open stream: No such file or directory in F:\Program Files\PHP\PEAR >\Text\CAPTCHA.php on line 22 >Fatal error: require_once() [function.require]: Failed opening >required 'PEAR.php' (include_path='F:\www\include') in F:\Program Files >\PHP\PEAR\Text\CAPTCHA.php on line 22 So where is the pear.php file? >It appears that it cann't find the pear.php ,but i checked my dir,and >this file was there,and also I have my php.ini file checked,the >include_path=include_path=".;F:\Program Files\PHP\pear;F:\www >\include" ,it looks all right,was there anything I have missed in the >config...

pdf in php
Hi friends. I am designing a document management system. For that i have to convert all the MS word files in to pdf on the fly. I use php 4. I have to recomplie php to enable the pdfliblite. Is there any way to do it witout recompiling? The other pdf libraries like fpdf or ezpdf do not work with MS word docs. help me. Thanks. regards, T.Shrinivasan. shrini wrote: > Hi friends. > > I am designing a document management system. > For that i have to convert all the MS word files in to pdf on the fly. > I use php 4. > > I have to recomplie php to enable the pdfliblite. > Is there any way to do it witout recompiling? > > The other pdf libraries like fpdf or ezpdf do not work with MS word > docs. > > help me. Thanks. > > regards, > T.Shrinivasan. > It would be helpful, if you could tell us what platform you are using. I didn't know pdflib does that on linux. Sanjay ...

PDF TO PDF/A
Hello, is it possible to convert a PDF file to PDFA file ? i tried the -dPDFA option and i vefy my pdf file with PDF longlife. I have an error. Thanks for help. ...

extract text layer from searchable pdf and merge with another pdf
Dear comp.text.pdfians I have a pdf (a searchable pdf consisting in book pages scans, then passed to ocr that has added a text layer hidden under images, so pdf is searchable) this pdf has jbig2 compression (it counts 135 pages in A5 format scanned at 300 dpi and its size is about 1928 KB) After ocrization, I noticed that scans have been degrated in quality, so I want extract text layer and merge this text layer with another copy of same pdf containing scans in high quality it is possible extract a text layer from a pdf and then merge with raster layer of another pdf? -- Puppy Linux...

about php lang
plz told me that what is the php lang., how does it works and where it used. mani <msb.jod@gmail.com> wrote: > plz told me that what is the php lang., how does it works and where it > used. There is a good explanation at http://lmgtfy.com/?q=php regards Henrik -- The address in the header is only to prevent spam. My real address is: hc3(at)poolhem.se Examples of addresses which go to spammers: root@localhost postmaster@localhost On 16 Jun 2009, mani <msb.jod@gmail.com> wrote: > plz told me that what is the php lang., how does it works and > where it used. Lulz. <http://www.google.com/> -- ~Curtis Anonymous (1984 IOCCC winner): int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\ o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);} mani schreef: > plz told me that what is the php lang., how does it works and where it > used. http://en.wikipedia.org/wiki/Php Tip: If you want to learn something about a subject totally new to you, try wikipedia for a reasonable intro. Regards, Erwin Moller -- "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." -- C.A.R. Hoare ...

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

PHP to PDF
I realize there's many tools available to convert HTML pages to PDF using P= HP. What I need to do is to convert data from an SQL database to PDF. I use= PHP to connect to the database and print the results to the screen, no pro= blem there. What I need is a way to print this same data from the SQL query= to a PDF file. Anyone have any tips or anything - with examples - to get this done? I'd ap= preciate any serious help or tips. Thanks very much. You're welcome to email me as well as posting here - davebowlin at gmail do= t com In article <d33d5e52-91d6-4aa6-b159-92ebe60b685e@googlegroups.com>, cresh <davebowlin@gmail.com> wrote: > I realize there's many tools available to convert HTML pages to PDF using > PHP. What I need to do is to convert data from an SQL database to PDF. I use > PHP to connect to the database and print the results to the screen, no > problem there. What I need is a way to print this same data from the SQL > query to a PDF file. > > Anyone have any tips or anything - with examples - to get this done? I'd > appreciate any serious help or tips. Thanks very much. > > You're welcome to email me as well as posting here - davebowlin at gmail dot > com I once did this using Latex, and then a latex to PDF converter. -- Sandman[.net] cresh wrote: > I realize there's many tools available to convert HTML pages to PDF using PHP. What I need...

PHP - using mail() and unicode text
I have the following problem. On a website there's a (simple) feedback form. This is used also by Polish visitors who (of course) type Polish text using special characters. However, when I receive the text in my mailbox, all special characters have been turned into mess...... For example: "wsp�lprace" is turned into "współprace". It seems PHP is handling the Unicode-8 strings quite well (when I 'echo' the strings on the site, I see the text correctly), until the point that it is send by using mail(). Is this a server configuration issue? Or something el...

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What about making "text + text" as an equivalent for "text > || text"? Most strongly-typed programming languages do this. And MS SQL > Server too, I think (CMIIW). What would this gain except for bloat? It's not like SQL is utterly compatible with any programming language; users will still have to learn all the operators anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Peter Eisentraut wrote: >>What is "text + text" supposed to do right now? > > Nothing. Then are these bugs? (7.4.5 and 8.0.0beta1 give same results). Frankly, the current behaviour is quite strange to me. ------------------ =...

PHP + PDF
We decided to use output_handler = ob_gzhandler on one of our server, because it is very slow and this is the only option that keeps the old page on screen instead of clearing it before displaying the new one. With this configuration a problem arises in creating PDF docs to be displayed on fly, because IE asks for download instead of opening the pdf reader (while Netscape works fine). I've been told that this is a IE bug. However all of our user run IE and it should difficult to convince them to use another browser because IE is said to have a bug. Probably they would think that we progr...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iPhone, iPod touch, Sony reader and more; Extract the text, images, graphics, tables and hyperlinks for reusing in other applications. More Info: http://www.pdf-converter-mac.net/ PDF to Word Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-word/ PDF to Word Converter for Mac is specially designed for Mac users, which helps to convert PDF to Word on Mac OS X. PDF to PPT Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-ppt/ PDF to PPT Converter for Mac empowers Mac users to convert PDF to Microsoft PowerPoint accurately. PDF to Excel Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-excel/ PDF to Excel Converter for Mac extracts the data in the PDF file to Excel spreadsheets for easy editing and calculating. PDF to ePub Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-epub/ PDF to EPUB Converter for Mac is an efficient PDF to E...

GUIDE: Using xtopdf to create PDF from text and DBF files (including creating simple PDF e-books)
Hi, I'm giving below, steps to install and use my xtopdf PDF creation/conversion toolkit. This post is for end-users. xtopdf is both a set of end-user tools and a library for use by developers, to create PDF from various input formats. I'll post another message here about how developers can use it in various ways, sometime later. The steps are for the Windows platform. Will do another post for Linux. 1. Get Python v2.4.3 here: http://www.python.org/ftp/python/2.4.3/python-2.4.3.msi Size is not more than 10 MB. Install it - its an MSI, so just double-click. (Any Python version >...

Using xtopdf, a PDF creation toolkit
Hi, Though I posted about this article earlier, reposting it with a more appropriate title, to make it easier for searches. "Using xtopdf, a PDF creation toolkit" URL: http://www.packtpub.com/article/Using_xtopdf This is an article by me, written for Packt Publishing, about how to use my xtopdf toolkit to create PDF from text, DBF, TDV, CSV and XLS data. xtopdf is available at http://www.dancingbison.com/products.html . Enjoy, and feel free to give your feedback on wanted features, bugs, etc. in xtopdf - its appreciated. Vasudev Ram ~~~~~~~~~~~~~~~~~~~~~~ Software consulting an...

How to get text from PDF?
Hi all, I have my web server bases on linux. I am working on a project for which I need to get text out of PDF file. I need to know which text belongs to which PDF page number? Is there any utility/tool that should be installed on linux and I can use it from command line in PHP through exec() or system() etc for this purpose? Please reply me urgently. Thanks in advance. On 22 Dec, 15:03, Shahid <mirzashahidmahm...@gmail.com> wrote: > Hi all, > > I have my web server bases on linux. I am working on a project for > which I need to get text out of PDF file. I need to know which text > belongs to which PDF page number? > > Is there any utility/tool that should be installed on linux and I can > use it from command line in PHP through exec() or system() etc for > this purpose? > > Please reply me urgently. > > Thanks in advance. Oh dear, is google **again** http://www.google.co.uk/search?q=postscript+to+text&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a C. ...

PDF extensions to php?
Need to write some PDFS..after excruciating issues managed to install the PECL php libraries. Seems OK in the phpinfo reports they now exist. A couple of questions though. 1/. Does the PDF format support right justification? It's nice to have prices lined up properly.. 2/. I will probably want not to make the final output a file, but to dump it to either 'stdout' as part of a downloadable PDF, or to send it as an output stream to either a mail program, or a print queue. In none of these cases is an actual real file required..but as far as I can tell the php 'resource' that constitutes the actual PDF entity is not a text string..or is it? If anyone has used this library, I'd be most grateful for any feedback. The tasks in hand are not large. Lists and tables and thats about it. he odd image maybe. Standard fonts only. The most compatible encodings etc. As I said, the final output wants to be piped either to a Linux system call to lp or sendmail, or to the browser itself, with appropriate MIME headers, to be a 'downloadable PDF'..The last thing I want is a file cluttering up the disk.. The Natural Philosopher wrote: > Need to write some PDFS..after excruciating issues managed to install > the PECL php libraries. Seems OK in the phpinfo reports they now exist. > A couple of questions though. > > 1/. Does the PDF format support right justification? It's nice to have > prices lined up properly.. > ...

copying text from .pdf
Morning! why do texts from different .pdf files get copied differently (if selected with 'Touch up text' and copied to a text editor)? I have observed the following behaviors: - single characters are marked in the original; copied, there are no spaces inbetween; - a text gets underlined in the original; copied, there are no spaces; - whole text gets marked; copied, it's usually fine; - only a single line can be copied at a time; usually without extra problems... What does this depend on? When I convert a Word document to .pdf, can I choose which of the above behaviors I will prefer for the resulting file? Many thanks in advance! bj This is off-topic here. Please post it to comp.text.pdf. -- EventStudio System Designer 2.5 - http://www.EventHelix.com/EventStudio System Design with Sequence Diagrams in PDF and Word EMF On Thu, 24 Nov 2005 19:14:07 +0100, "bj" <SpillOut99@yahoo.com> wrote: >Morning! > >why do texts from different .pdf files get copied differently (if selected with 'Touch up text' and copied to a text editor)? I have >observed the following behaviors: >- single characters are marked in the original; copied, there are no spaces inbetween; >- a text gets underlined in the original; copied, there are no spaces; >- whole text gets marked; copied, it's usually fine; >- only a single line can be copied at a time; usually without extra problems... > >What does this depend on? When I convert ...

PDF in PHP #2
Hi all , I have trying to convert a PHP file to PDF. Search the net , found HTML_toPDF open source tools , which convert html file to PDF. Tried to product , it work seamless without any error. Howeve , for my case , when user click the search button to 'search items' , the data are display in the listall.php. On listall.php , there are 'print' button which print the listall.php. Because the HTML_toPDF only convert html to PDF, i need to find a way how to convert the listall.php to listall.html dynamically before i called the HTML_toPDF functions. Anyone have ideas how to do it or suggestions is much appreciated . Thanks - weetat weetat.yeo@gmail.com wrote: > Hi all , > > I have trying to convert a PHP file to PDF. > Search the net , found HTML_toPDF open source tools , which convert > html file to PDF. > Tried to product , it work seamless without any error. > > Howeve , for my case , when user click the search button to 'search > items' , the data are display in the listall.php. On listall.php , > there are 'print' button which print the listall.php. > > Because the HTML_toPDF only convert html to PDF, i need to find a way > how to convert the listall.php to listall.html dynamically before i > called the HTML_toPDF functions. > > Anyone have ideas how to do it or suggestions is much appreciated . > > Thanks > - weetat > Have a look at the output bufferi...

Vertically text at PDF
Hello Is there any possibility to write some text _vertically_ at PDF doc? I have to do some changes at PDF documents using annotations, but they must be placed vertically... Maybe you know something about another solutions of that problem? Thanks OctoTools ( www.ocotools.com ) can be made to output vertical text. I need to know more about your problem -- Murray Bob Murray@jbmsystems.com 978-535-7676 OctoTools(tm) Squeezes More From Your Budget Well, I try to change my PDF docs by adding some always (it's very important!) visible annotations (/FreeText), using update techniqu...

PDF hidden text
Hi, We use a third party API to convert TIFs and PDFs to PDFS with hidden text. Is there a way to know if the PDF file has already got hidden text in it. Such PDFs I don;t want to submit to the PDF generator to create PDFs with hidden txt. The API SDK I am using doesn't support this. Any ideas?? Regards, Trivender $ingh ...

PDF ---> Text
Is there an easy way to convert a PDF file to a plain text file that can be edited with any plain text editor? In article <v9b3e2pmrf02vlm8hkbkqeqkobl22ktiht@4ax.com>, cwaiken@nospam.com wrote: > Is there an easy way to convert a PDF file to a plain text file > that can be edited with any plain text editor? Maybe this? http://mac.softpedia.com/get/Word-Processing/pdftotext-Installer-Package. shtml -- We could certainly slow the aging process down if it had to work its way through Congress. -- Will Rogers cwaiken@nospam.com wrote: > Is there an easy way to convert a PDF f...

Web resources about - pdf to text - comp.lang.php

Text - Wikipedia, the free encyclopedia
Text is available under the Creative Commons Attribution-ShareAlike License ;additional terms may apply. By using this site, you agree to the ...

Texts between schoolgirl terror suspect and co-accused Milad Atai released in court
A Sydney schoolgirl charged with sending $5000 to Islamic State was used as a middleman by her relative who is believed to be overseas fighting ...

Education letters: year 12 English text, No Sugar, a giant stretch for EAL students
The Year 12 English syllabus needs to include texts that are challenging and interesting but also with a level and style of English that migrant ...

Strangers deliver gifts to newborn baby after receiving wrong-number text
Everyone has a great ‘wrong number’ text story, but this one takes the cake.

Scanner Pro for iOS updated to version 7 w/ text recognition, workflows, more
... for iOS has today received a hefty update. The latest update brings the app to version 7 and includes a host of new features, including text ...

Cola Messenger Looks to Simplify Text Messaging on iOS
... with friends by allowing them to send interactive ‘Cola Bubbles’ in the chat window. When chatting in Cola Messenger, users can send text messages, ...

WhatsApp testing text formatting to include bold and italics in messages
The latest beta release of WhatsApp features some basic text formatting options including bold and italics. While features in beta versions of ...

Android N Multi-Window Includes Ability To Drag & Drop Text
Now that Android N is officially here in a preview form and everyone has had a chance to digest its arrival, the details on what is on offer ...

Recruiters Using Text To Contact Job Seekers. Really?!
More and more job seekers are receiving texts from recruiters/head-hunters when contacted for the first time. " Found your resume online. Have ...

Text Messages: Lewandowski Never ‘Acknowledged’ Grabbing Michelle Fields, Despite Erroneous Daily Beast ...
Text messages between myself and Corey Lewandowski, Donald Trump’s campaign manager, prove that an article in The Daily Beast that alleged Lewandowski ...

Resources last updated: 3/23/2016 11:38:30 PM