extract text from mac pdf

  • Permalink
  • submit to reddit
  • Email
  • Follow


i find that the text extracted from a pdf generated from pagemaker 6.5 (mac 
version) is monster characters.

is there a way to do it?  Thanks a lot.

tony



0
Reply Tony 10/27/2005 8:24:01 AM

See related articles to this posting
comp.text.pdf 5537 articles. 36 followers. Post

0 Replies
167 Views

Similar Articles

[PageSpeed] 12

Reply:

Similar Artilces:

PDF::API2
Hello All, I am new to PDF files so I don't really know if what I want to do is possible and how to use the PDF::API2 modules. I need to extract information from columns in a table ( I assume that PDF does not know anything about tables). What I was thinking of doing was finding the horizontal location of the header (I know what it should be), then extract all text that starts at that location. I have played around with the PDF::API2 module and read the 'Using PDF::API2 - The code' help page, however it doesn't show me how to extract information from an existing file. ...

ghostscript PDF page extraction, leaving text as text
Ghostscript may be used to extract pages from a PDF file with a command like this: gs -sDEVICE=pdfwrite \ -dNOPAUSE -dBATCH -dSAFER \ -dFirstPage=48 -dLastPage=48 \ -sOutputFile=onepage.pdf input.pdf The problem is, while that page looks the same as the original in a PDF reader, it seems to be an image rather than an "object" representation. That is, open the extracted PDF in something like Acrobat or PDF XChange Viewer and "search" and "text selection" work, whereas in the extracted one neither function works. Presumably this is because the text has been r...

PDF::API2
Hello All, I am new to PDF files so I don't really know if what I want to do is possible and how to use the PDF::API2 modules. I need to extract information from columns in a table ( I assume that PDF does not know anything about tables). What I was thinking of doing was finding the horizontal location of the header (I know what it should be), then extract all text that starts at that location. I have played around with the PDF::API2 module and read the 'Using PDF::API2 - The code' help page, however it doesn't show me how to extract information from an exi...

extract text layer from searchable pdf and merge with another pdf
Dear comp.text.pdfians I have a pdf (a searchable pdf consisting in book pages scans, then passed to ocr that has added a text layer hidden under images, so pdf is searchable) this pdf has jbig2 compression (it counts 135 pages in A5 format scanned at 300 dpi and its size is about 1928 KB) After ocrization, I noticed that scans have been degrated in quality, so I want extract text layer and merge this text layer with another copy of same pdf containing scans in high quality it is possible extract a text layer from a pdf and then merge with raster layer of another pdf? -- Puppy Linux...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iP...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iP...

extract Text from PDF
Hello NG! We Would like to extract addressdetails from PDF Letters placed on certain coordinates defined by German DIN Standard for Letters. For this purpose we=B4re looking for a solution to extract Text from a PDF Document placed on certain Pixel-Coordinates. Does somebody knew a possible Solution for this Problem? We=B4ve tried really much to achieve this task, unfortunately without any success yet. Thank you very much in Advance. Markus aparasta@epitop.com wrote: > Hello NG! > > We Would like to extract addressdetails from PDF Letters placed on > certain coordinates defin...

Extract text from .pdf
I have Acrobat Pro, is it possible to extract text from a .pdf? I see the "save as" options including Word Doc but it still seems to be an image? The ocr software with my cannon lide 200 scanner is as useless as tits on a boar hog.......... In article <C61B79B5.40FEF%elvisp@compuserve.com>, The Wolf <elvisp@compuserve.com> wrote: > I have Acrobat Pro, is it possible to extract text from a .pdf? I see the > "save as" options including Word Doc but it still seems to be an image? Acrbat has its own OCR built-in. I've found it to be very accurate, eve...

Extracting text from pdf
Hi, I have to index the text of a pdf document. Does any of you know of a PHP script/extension or a binary that is able to extract the text ? The pdf extension mentioned in the php.net docs seem to indicate that it's for _creation_ of documents only, is that so? Same with all the PHP classes i have found. Regards, Johnny -- Never express yourself more clearly than you are able to think. - Niels Bohr *** JustinCase wrote/escribi� (25 Oct 2004 16:09:36 GMT): > Does any of you know of a PHP script/extension or a binary that is able > to extract the text ? There's a Unix pro...

Extract Text from PDF
Hi, Does anyone know a way to extract plain text from a PDF using Ruby? Many Thanks, ~ Mark -- Posted via http://www.ruby-forum.com/. On 13.04.2007 14:06, Mark Dodwell wrote: > Does anyone know a way to extract plain text from a PDF using Ruby? IIRC there is a project under way to extend PDFWriter with reading capabilities. I don't know the current status of that. HTH robert Robert Klemme wrote: > On 13.04.2007 14:06, Mark Dodwell wrote: >> Does anyone know a way to extract plain text from a PDF using Ruby? > > IIRC there is a project under way to extend PD...

PDF extract text
Hello, how can I extract text, images and other structures can be ignored, with PHP from a PDF file? We have a lot of LaTeX PDFs and Powerpoint PDFs and would like to extract only the text content to create a text analysis of the content eg for LaTeX scripts we would like the chapter structure as well. Is there any solution to do this with build-in PHP functions? Thanks Phil Philipp Kraus wrote: > how can I extract text, images and other structures can be ignored, > with PHP from a PDF file? For example with “PDF Parser”. You cannot have searched before po...

EXTRACT TEXT FROM PDF
Hello. I urgently need a C++/C# library to extract TEXT FROM PDF! Please somebody help! ...

Extract Text Coordinates from PDF
Hi, I was wondering if anyone could recommend a program which can extract the starting (top left) coordinates (x,y) of each word in a PDF file (and the end if possible). Ideally output would be in a format that could be easily inserted into a database. Hi, We did that here for an internal parsing requirement but did not make it a commercial product. That would take additional funding to bring it up to a marketable product. For a one time function, it would not be worth the cost. As an OEM or volume product, of course the picture changes. BTW our output was designed to take the informati...

extracting text from pdf files
Can anyone help me with how to extract text from pdf files using PHP or ColdFusion? Thanks for any help. Hi, Try the Xpdf project. Run the pdftotext command in the shell to produce the text. http://www.foolabs.com/xpdf/download.html There's more tips at php.net/pdf. runner7@fastmail.fm wrote: > Can anyone help me with how to extract text from pdf files using PHP or > ColdFusion? Thanks for any help. petersprc@gmail.com wrote: > Hi, > > Try the Xpdf project. Run the pdftotext command in the shell to produce > the text. > > http://www.foolabs.com/xpd...

extract Text from PDF #2
Hello NG! We Would like to extract addressdetails from PDF Letters placed on certain coordinates defined by German DIN Standard for Letters. For this purpose we=B4re looking for a solution to extract Text from a PDF Document placed on certain Pixel-Coordinates. Does somebody knew a possible Solution for this Problem? We=B4ve tried really much to achieve this task, unfortunately without any success yet. Thank you very much in Advance. Markus ...

Colored Text extraction from PDF
Hi All is it possible to extract the colored text from pdf. for example: There are 3 color texts in a pdf -- RED, GREEN and BLACK. is it possible to extract text which are red and green in color? - Regards Azodious Azodious wrote: > Hi All > is it possible to extract the colored text from pdf. > > for example: > There are 3 color texts in a pdf -- RED, GREEN and BLACK. > is it possible to extract text which are red and green in color? > Yes. It is possible, but I know of no method I'd actually want to use. Just my �0.02 worth. -- RGB On Wed, 3 Jun 2009 07:4...

Extract Text from PDF programatically
Hi all, I need to extract the text from a pdf programatically, I have an application in C# and written a ghostscript wrapper but still cannot work it out. I have tried the pstotext script but I can only get gs to output to its consol, which doesnt help me, also I dont want to run an external exe. Any ideas will be greatly appreciated. Mark ----== Posted via Usenet.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.Usenet.com The #1 Newsgroup Service in the World! >100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- Mark Redm...

How to extract text from a PDF document
Hello, How can I extract text from a (MS Word) PDF file? I've tryed pdftotext but it only produce crap, not one readable cleartext sentence. :) Exists other (free) utilties to convert pdf to a text file or extract text? I think it must possible, because I also can copy and paste text from PDF documents. greetings Fabian Hello Fabian: You can try our product Chief-Win PDF Converter Personal Edition V1.1, convert PDF to word/text. You can download it through : http://www.chief-win.com/setup.exe, it allow 21 days free trial with full function. Or you can try Easy PDF To Text...

specific text extraction from pdf
I've researched a lot, but still not found the solution. Let me explain: A pdf file is uploaded. The file can look in a million of manner, right? Im talking about its disposition. What I need to do is to fetch each odd row of the text (but only the paragraph text. Extracting text from pdf often means you also get that text that for example is inside an image) and cover that line with black color, so the text line is not readable anymore. Or maybe I want to do the same but for each odd word in the paragraphs. As you understand, it is about: 1) Extract text from pdf 2)Analyse it. What te...

Parse pdf to extract text???????
Is there anyway to use php to parse a pdf file and extract text from the document? I have been looking around for a few days now and still really havent found much..... If anyone could help it would be greatly appreciated. Thanks, Nick On Nov 29, 5:46 pm, "Nicholas.B.Car...@gmail.com" <Nicholas.B.Car...@gmail.com> wrote: > Is there anyway to use php to parse a pdf file and extract text from > the document? I have been looking around for a few days now and still > really havent found much..... > > If anyone could help it would be greatly appreciated. > >...

Read and extract text from pdf
Hi, I have a problem :), I just want to extract text from pdf file with python. There is differents libraries for that but it doesn't work... pyPdf and pdfTools, I don't know why but it doesn't works with some pdf... For example space chars are delete in the text.. Pdf playground : I don't understand how it work. If you have an idea, a tutorial, a library or anything who can help me to do that. Julien ARNOUX: >I have a problem :), I just want to extract text from pdf file with >python. There is differents libraries for that but it doesn't work... > >pyPdf a...

extract text from PDF file
Hello, How can I extract text from a (MS Word) PDF file? I've tryed pdftotext but it only produce crap, not one readable cleartext sentence. :) Exists other utilties to convert pdf to a text file or extract text? I think it must possible, because I also can copy and paste text from PDF documents. greetings Fabian In article <44cdb91b$0$7874$6e1ede2f@read.cnntp.org>, fho@mailinator.com says... > Hello, > > How can I extract text from a (MS Word) PDF file? This isn't really a PostScript question.... > I've tryed pdftotext but it only produce crap, not o...

Extract Text from PDF
Does anyone know of a FileMaker Plug-in that can extract the text from a PDF stored in a container field? Could this be done with a Custom Function? Thanks, Sean On Mar 12, 3:40 pm, "s...@codexdata.com" <s...@codexdata.com> wrote: > Does anyone know of a FileMaker Plug-in that can extract the text from > a PDF stored in a container field? Not offhand, but that's not to say their isn't one. There are however, any number of command line text extractors that could be controlled by filemaker, set to create a file containing the extracted text, and then have that...

extracting pure text from pdf
Hi, is there a way (e.g. sample code) to extract pure text from pdf with realbasic? Thanks. Frank In article <1i9fu85.1h0rx461hw2ikrN%spam@ghostlink.de>, spam@ghostlink.de (Frank Esselbach) wrote: > Hi, > > is there a way (e.g. sample code) to extract pure text from pdf with > realbasic? Thanks. > > Frank I do it on the mac with the free version of the pdf2txt unix command and you use it from rb with the command shell works nice for me. -- Jean-Yves. Frank Esselbach <spam@ghostlink.de> wrote: > Hi, > > is there a way...