f



ghostscript PDF page extraction, leaving text as text

Ghostscript may be used to extract pages from a PDF file with a
command like this:

gs -sDEVICE=pdfwrite \
 -dNOPAUSE -dBATCH -dSAFER \
 -dFirstPage=48 -dLastPage=48 \
 -sOutputFile=onepage.pdf  input.pdf

The problem is, while that page looks the same as the original in a
PDF reader, it seems to be an image rather than an "object"
representation.  That is, open the extracted PDF in something like
Acrobat or PDF XChange Viewer and "search" and "text selection" work,
whereas in the extracted one neither function works. Presumably this
is because the text has been rasterized.

Is it possible to use gs to extract ranges of pages, preferably also
reducing the resolution of the embedded images, but leaving the text
as text?  I frequently need to reduce the size of PDF files, but it
should all come out of the resolution of the images, and the text
should remain as accessible as it was in the original.

If ghostscript cannot do this, is there another linux tool that can?

Thanks,

David Mathog
0
dmathog (174)
5/6/2010 9:19:15 PM
comp.lang.postscript 3552 articles. 0 followers. Post Follow

8 Replies
1898 Views

Similar Articles

[PageSpeed] 54

>>>>> "David" =3D=3D David Mathog <dmathog@gmail.com> writes:

    David> gs -sDEVICE=3Dpdfwrite \ -dNOPAUSE -dBATCH -dSAFER \
    David> -dFirstPage=3D48 -dLastPage=3D48 \ -sOutputFile=3Donepage.pdf
    David> input.pdf

I've just  tried this with a PDF  file, and it works:  search and select
works on both onepage.pdf and input.pdf.


    David> The problem is, while that page looks the same as the
    David> original in a PDF reader, it seems to be an image rather than
    David> an "object" representation.  That is, open the extracted PDF
    David> in something like Acrobat or PDF XChange Viewer and "search"
    David> and "text selection" work, whereas in the extracted one
    David> neither function works. Presumably this is because the text
    David> has been rasterized.

Maybe, your PDF file has used some special features (e.g. transparency),
so that GS has decided that  the most faithful way of converting it into
PDF is to rasterize the page?


    David> Is it possible to use gs to extract ranges of pages,
    David> preferably also reducing the resolution of the embedded
    David> images, but leaving the text as text?  I frequently need to
    David> reduce the size of PDF files, but it should all come out of
    David> the resolution of the images, and the text should remain as
    David> accessible as it was in the original.

For page selection, try 'pdftk' or 'pdfjam'.


    David> If ghostscript cannot do this, is there another linux tool
    David> that can?

GS can do it, but maybe not in your special case.


--=20
Lee Sau Dan                     =A7=F5=A6u=B4=B0                          ~=
{@nJX6X~}

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
danlee (1495)
5/7/2010 1:58:58 AM
In some cases such as bitmap or Type3 fonts -dEmbedAllFonts=true has
worked.

Ed


0
abeddie (96)
5/7/2010 12:24:51 PM
On May 7, 5:24=A0am, "abed...@hotmail.com" <abed...@hotmail.com> wrote:
> In some cases such as bitmap or Type3 fonts -dEmbedAllFonts=3Dtrue has
> worked.

Tried that and it didn't work.

On 07 May 2010 09:58:58 +0800 LEE Sau Dan <dan...@informatik.uni-
freiburg.de> wrote

>Maybe, your PDF file has used some special features (e.g. transparency),
>so that GS has decided that  the most faithful way of converting it into
>PDF is to rasterize the page?

Yes, I think there is something wrong with the input file.  When it
runs through ghostscript this is emitted:

GPL Ghostscript 8.64 (2009-02-03)
Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
   **** Warning:  File has an invalid xref entry:  22.  Rebuilding
xref table.
Processing pages 1 through 1.
Page 1

   **** This file had errors that were repaired or ignored.
   **** The file was produced by:
   **** >>>> Mac OS X 10.5.8 Quartz PDFContext <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

This run was on Linux (but the problem originally surfaced with
ghostscript on Windows). As you can see the PDF was generated on a
Mac, so it could be some Mac vs. other OS incompatibility.  Plus
whatever this xref corruption is.

Here is the problem input file (25MB - that's why I want to reduce
it.)

http://saf.bio.caltech.edu/pub/pickup/problem.pdf

It is a set of lecture notes from a class.  The unselectable text
problem surfaces even when only the first page is extracted.
0
dmathog (174)
5/7/2010 4:16:58 PM
David Mathog wrote:

> On May 7, 5:24 am, "abed...@hotmail.com" <abed...@hotmail.com> wrote:
>> In some cases such as bitmap or Type3 fonts -dEmbedAllFonts=true has
>> worked.
> 
> Tried that and it didn't work.
> 
> On 07 May 2010 09:58:58 +0800 LEE Sau Dan <dan...@informatik.uni-
> freiburg.de> wrote
> 
>>Maybe, your PDF file has used some special features (e.g. transparency),
>>so that GS has decided that  the most faithful way of converting it into
>>PDF is to rasterize the page?
> 
> Yes, I think there is something wrong with the input file.  When it
> runs through ghostscript this is emitted:
> 
> GPL Ghostscript 8.64 (2009-02-03)
> Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
> This software comes with NO WARRANTY: see the file PUBLIC for details.
>    **** Warning:  File has an invalid xref entry:  22.  Rebuilding
> xref table.
> Processing pages 1 through 1.
> Page 1
> 
>    **** This file had errors that were repaired or ignored.
>    **** The file was produced by:
>    **** >>>> Mac OS X 10.5.8 Quartz PDFContext <<<<
>    **** Please notify the author of the software that produced this
>    **** file that it does not conform to Adobe's published PDF
>    **** specification.
> 
> This run was on Linux (but the problem originally surfaced with
> ghostscript on Windows). As you can see the PDF was generated on a
> Mac, so it could be some Mac vs. other OS incompatibility.  Plus
> whatever this xref corruption is.
> 
> Here is the problem input file (25MB - that's why I want to reduce
> it.)
> 
> http://saf.bio.caltech.edu/pub/pickup/problem.pdf
> 
> It is a set of lecture notes from a class.  The unselectable text
> problem surfaces even when only the first page is extracted.

The "text" on the first two pages indeed is contained in images.
Apart from that, the PDF as downloaded from your site is OK. The xref 
comlaints you got must be due to a transfer error (probably some end of line 
conversion during transfer?).

Helge

0
5/7/2010 4:57:46 PM
On May 7, 9:57=C2=A0am, Helge Blischke <h.blisc...@acm.org> wrote:
> The "text" on the first two pages indeed is contained in images.
> Apart from that, the PDF as downloaded from your site is OK. The xref
> complaints you got must be due to a transfer error (probably some end of =
line
> conversion during transfer?).

It isn't working for me still, and the transfer was OK, md5sum didn't
change. Ghostscript is 8.64.
On the linux system extract a single page with this command:

gs -sDEVICE=3Dpdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=3D48 -
dLastPage=3D48 -sOutputFile=3Dfoo48.pdf -dEmbedAllFonts=3Dtrue
BMB170c_2010_LECTURE11.pdf

Open that up with PDF Xchange viewer on Windows XP.  It looks OK.
HOWEVER, search doesn't work.
To see why, select any text, copy and paste into a word processor.
Garbage.  Search on the original
page did work.  So it looks like ghostscript is remapping the
characters to the font entries during
the extraction, possibly at the

   **** Warning:  File has an invalid xref entry:  22.  Rebuilding
xref table.

step.  Specifically, the first text on page 48 of the original is
"Polysaccharide A" and it copies that way to another application.
However, in the extracted page 48 copy/paste of that text reads "=3D%=E2=88=
=80#:
3=E2=88=83=E2=88=83635=E2=88=8B79)=CE=92" (some of those are unprintable ch=
aracters, not sure how
they will post) even though when displayed in the PDF viewer it still
displays as "Polysaccharide A".  Here is the extracted file:

  http://saf.bio.caltech.edu/pub/pickup/foo48.pdf

The properties for the "Polysaccharide A" text was:

Font: Calibri (Embedded Subset)
  Type: TrueType
  Encoding: Custom
  Object Number: [X]
  Global Object ID: 0
Font Size 43.5 pt
Horizontal Scaling: 100%
Baseline Offset: 0.0 pt

in both the original and the extracted page, with [X]=3D10 in the
extracted file, [X]=3D23 in the original.  Not sure if this relates to
the error message, could be a coincidence, but as all programmers
know, 22 is 23 if you count from 0 instead of 1.

Thanks.

David Mathog
0
dmathog (174)
5/7/2010 6:19:57 PM
David Mathog <dmathog@gmail.com> 2010-05-07:

> Here is the problem input file (25MB - that's why I want to reduce
> it.)
> 
> http://saf.bio.caltech.edu/pub/pickup/problem.pdf

Go to page 47 and search for ">". Note that the two "ti" strings also
match!

0
5/7/2010 8:37:09 PM
On May 7, 1:37=A0pm, uhhu <M8R-kwn...@mailinator.com> wrote:
> David Mathog <dmat...@gmail.com> 2010-05-07:
>
> > Here is the problem input file (25MB - that's why I want to reduce
> > it.)
>
> >http://saf.bio.caltech.edu/pub/pickup/problem.pdf
>
> Go to page 47 and search for ">". Note that the two "ti" strings also
> match!

Yup. Also "tt" as seen in the display is represented by "K" in the
encoding. (Search for "zwiker" and it finds "zwitter".)  Feels like
the PDF generator had a few too many characters to fit into a finite
sized lookup table and so used an alternative encoding.
0
dmathog (174)
5/7/2010 8:49:18 PM
In article <bc631d67-d6e4-4be1-9e35-
35bd78e5756f@s4g2000prh.googlegroups.com>, dmathog@gmail.com says...

> On the linux system extract a single page with this command:
> 
> gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=48 -
> dLastPage=48 -sOutputFile=foo48.pdf -dEmbedAllFonts=true
> BMB170c_2010_LECTURE11.pdf
> 
> Open that up with PDF Xchange viewer on Windows XP.  It looks OK.
> HOWEVER, search doesn't work.
> To see why, select any text, copy and paste into a word processor.
> Garbage.  Search on the original
> page did work. 

This is an example of why its important to carefully describe the 
problem. Your original post was quite firm that the problem was 
conversion to a bitmap image.

This is actually quite a different problem. In order to 
search/copy/paste text Acrobat wants a ToUnicode CMap in the output PDF 
file, this allows it to 'know' what the Unicode code point is for a 
given glyph on the page.

Without that, Acrobat will fall back to other approaches; if the 
Encoding for the font is one of the standards then it will use that to 
work out the Uicode values. If the Encoding is non-standard, but the 
glyph names are recognisable it will try to use those. 

If none of the above is true, then it is forced to give up. In this case 
it copies the character indices directly form the PDF file, as a stream 
of bytes. From your later description it seems to me that this is what 
is happening.


> So it looks like ghostscript is remapping the
> characters to the font entries during
> the extraction, possibly at the
> 
>    **** Warning:  File has an invalid xref entry:  22.  Rebuilding
> xref table.
> 
> step.

No, that is Ghostscript telling you that it thinks there is something 
wrong with the original file. The xref table simply tells the PDF 
consumer where to find all the objects in the file so that it can 
interpret them. If the index is damaged then Ghostscript will scan the 
entire file looking for object declarations (eg 1 0 obj) and build a 
completely new index from that information. The presence or absence of a 
ToUnicode CMap, and the encoding of the fonts, is not affected by this.

> The properties for the "Polysaccharide A" text was:
> 
> Font: Calibri (Embedded Subset)
>   Type: TrueType
>   Encoding: Custom

So its not a standard Encoding, as I suspected.
 
Without seeing the original file (and no, I'm sorry but I'm not going to 
download and examine a 25MB file) I can't really say for sure what is 
going on. However I would suggest that you try a more up to date version 
of Ghostscript. 8.64 is a year old now, (the current version is 8.71) 
and there have been a number of changes to pdfwrite over the last year.


			Ken
0
ken161 (742)
5/8/2010 8:02:02 AM
Reply:

Similar Artilces:

Pages
I have a titel-textfield over a pic (headline) , text is black, background for text transparent. A second textfield should overlapp the first textline..... When i arrange the second field with the same settings like the first, the first text disappear... How can i do, that the second text overpapps the first, all over the pic. Any help appreciated! Thanks for replies! I am german and hope that the engish speaking people understand my problem! Soory! Gerd In article <611db9e2-b085-4fe5-907a-ca714b0c32dd@m74g2000hsh.googlegroups.com>, hurlebaus <gerd.schenk@freenet.de> wrote:...

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What about making "text + text" as an equivalent for "text > || text"? Most strongly-typed programming languages do this. And MS SQL > Server too, I think (CMIIW). What would this gain except for bloat? It's not like SQL is utterly compatible with any programming language; users will still have to learn all the operators anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Peter Eisentraut wrote: >>What is "text + text" supposed to do right now? > > Nothing. Then are these bugs? (7.4.5 and 8.0.0beta1 give same results). Frankly, the current behaviour is quite strange to me. ------------------ =...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to have it read for him. He has a PC. Found PDF reader $50 , http://thurly.net/11ia and http://thurly.net/11i4 the last being google. Wondering what you folks found useful or use ? Thanks! -- Bill S. Jersey USA zone 5 shade garden http://uppitywis.org/ live WI ...

How to extract the plain text from a rich text ?
Hi everybody ! I can't find how to get a wxstring containing the plain text from a wxstring containing rich text. Does anybody how to do that ? Thanks ! ...

PDF::API2
Hello All, I am new to PDF files so I don't really know if what I want to do is possible and how to use the PDF::API2 modules. I need to extract information from columns in a table ( I assume that PDF does not know anything about tables). What I was thinking of doing was finding the horizontal location of the header (I know what it should be), then extract all text that starts at that location. I have played around with the PDF::API2 module and read the 'Using PDF::API2 - The code' help page, however it doesn't show me how to extract information from an existing file. ...

extract Text from PDF
Hello NG! We Would like to extract addressdetails from PDF Letters placed on certain coordinates defined by German DIN Standard for Letters. For this purpose we=B4re looking for a solution to extract Text from a PDF Document placed on certain Pixel-Coordinates. Does somebody knew a possible Solution for this Problem? We=B4ve tried really much to achieve this task, unfortunately without any success yet. Thank you very much in Advance. Markus aparasta@epitop.com wrote: > Hello NG! > > We Would like to extract addressdetails from PDF Letters placed on > certain coordinates defin...

Extract text from .pdf
I have Acrobat Pro, is it possible to extract text from a .pdf? I see the "save as" options including Word Doc but it still seems to be an image? The ocr software with my cannon lide 200 scanner is as useless as tits on a boar hog.......... In article <C61B79B5.40FEF%elvisp@compuserve.com>, The Wolf <elvisp@compuserve.com> wrote: > I have Acrobat Pro, is it possible to extract text from a .pdf? I see the > "save as" options including Word Doc but it still seems to be an image? Acrbat has its own OCR built-in. I've found it to be very accurate, eve...

extract text layer from searchable pdf and merge with another pdf
Dear comp.text.pdfians I have a pdf (a searchable pdf consisting in book pages scans, then passed to ocr that has added a text layer hidden under images, so pdf is searchable) this pdf has jbig2 compression (it counts 135 pages in A5 format scanned at 300 dpi and its size is about 1928 KB) After ocrization, I noticed that scans have been degrated in quality, so I want extract text layer and merge this text layer with another copy of same pdf containing scans in high quality it is possible extract a text layer from a pdf and then merge with raster layer of another pdf? -- Puppy Linux...

Script to extract portions of text from a text file
I had a need to extract portions of text (delimited by fixed patterns) from a large text file. Here's a shell script that I wrote for this purpose. Kindly suggest a more "elegant" way to do the same: Thanks, Bhat #!/bin/ksh # Script to extract portions of text from a text file # # #set -x if [ "$#" -lt "3" ] then echo "usage $0: <input-file> <begin-pattern> <end-pattern>" exit 1 fi if [ ! -f "$1" ] then echo "File $1 does not exist" exit 2 fi if [ ! -r "$1" ] then echo "Error reading file $1...

Extract text with style from rich text fields
Hi everybody, Somebody know how can i extract text with its styles (bold, tables, pictures,etc) from a RicText field with LotusScript. I use the method richtextField.AppendText(textOfRitem) but only the text without his presentation is extracted. I put the extracted text in other RichText field (to create a report of all documents present in the base). in the follow line, the method that i use: Forall ch_action In doc.GetItemValue("RTField") Call rtitem.AppendText(ch_action) Call rtitem.AddNewline(2) End Forall Thank you Eric Mazzone, Hi Eric, a si...

Extract Text from a Checkbox and add text between values
Hi all,=20 FM 11Adv for widnows here :) I have a checkbox list such as: Apple, Banana, Orange, Lemon Now, I would like to add another text value to each selected value of the c= heckbox, in real time. So that when the user select "Orange" in the list, t= here will be a Custom dialog box that will let her to insert an adjective (= i.e. "Good", "Bad, "tasty" or whatever she likes). Finally, I would like to report in a second field the combination of checke= d value (from the checkbox) and the free adjective I have let the user add,= to have: "Orange good", "apple Bad", "lemon tasty" etc... I am trying to play with triggers so that the custom dialog box pops up as = far as a value in the checkbox is checked (this works). Unfortunately, the free text is added to the END of the checked values, and= I end up with something like:=20 Orange, Apple, Lemon good (or bad or tasty, actually the new added free tex= t erases the preceding choices instead of adding up)=20 Might you please help ? It would be nice to define a solution that does not= use script triggers as well, if at all possible... Thanks a lot ! Diego Op 2-9-2013 20:18, Diego B schreef: > Hi all, > > FM 11Adv for widnows here :) > > I have a checkbox list such as: > Apple, Banana, Orange, Lemon > > Now, I would like to add another text value to each selected value of the checkbox, in re...

Extract Text Coordinates from PDF
Hi, I was wondering if anyone could recommend a program which can extract the starting (top left) coordinates (x,y) of each word in a PDF file (and the end if possible). Ideally output would be in a format that could be easily inserted into a database. Hi, We did that here for an internal parsing requirement but did not make it a commercial product. That would take additional funding to bring it up to a marketable product. For a one time function, it would not be worth the cost. As an OEM or volume product, of course the picture changes. BTW our output was designed to take the information and place it on an OctoTools Template which is somewhat XML like. From there we could output CSV or a custom output if required. Call me if you are looking for a more commercial solution. Larry T. (978) 535-7676 US-Boston, MA On 11 Oct 2005 08:26:55 -0700, sebclark@gmail.com wrote: >Hi, >I was wondering if anyone could recommend a program which can extract >the starting (top left) coordinates (x,y) of each word in a PDF file >(and the end if possible). Ideally output would be in a format that >could be easily inserted into a database. pdw.exe, part of PDF Command Line Tools http://www.pdf-tools.com/asp/products.asp?name=CLE sample output using the -w option: 231.9 663.0 12.0 50.4 0 Cour: permits 295.7 663.0 12.0 21.6 0 Cour: the 330.6 663.0 12.0 28.8 0 Cour: text 372.8 663.0 12.0 72.0 0 Cour: extraction 458.2 663.0 12.0 28.8 0 Cour: from PDFLib ...

Extract Text out of PDF file
Does anyone know how to extract text out of a PDF file so that it can be ealisy imported into a databse? Example: Books. I would need a sepearte field for the title, author, publisher, date, description, image name, etc... I know all of this informaiton is stored in the PDF however, I can't seem to get it out correctly with doing it manually. Maybe, a apple script to pull based on font(?) or something... Any help will be greatly appricated. If there is a program out there or if anyone can build this for me that would rock. Matt PDFBox from http://www.pdfbox.org will do the trick for ...

Extract Text from PDF programatically
Hi all, I need to extract the text from a pdf programatically, I have an application in C# and written a ghostscript wrapper but still cannot work it out. I have tried the pstotext script but I can only get gs to output to its consol, which doesnt help me, also I dont want to run an external exe. Any ideas will be greatly appreciated. Mark ----== Posted via Usenet.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.Usenet.com The #1 Newsgroup Service in the World! >100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- Mark Redm...

extract text from PDF file
Hello, How can I extract text from a (MS Word) PDF file? I've tryed pdftotext but it only produce crap, not one readable cleartext sentence. :) Exists other utilties to convert pdf to a text file or extract text? I think it must possible, because I also can copy and paste text from PDF documents. greetings Fabian In article <44cdb91b$0$7874$6e1ede2f@read.cnntp.org>, fho@mailinator.com says... > Hello, > > How can I extract text from a (MS Word) PDF file? This isn't really a PostScript question.... > I've tryed pdftotext but it only produce crap, not one readable > cleartext sentence. :) Most likely your PDF file contains a re-encoded font. Possibly a CIDFont, in both cases it is quite tricky to go form the number representing the glyph to an ASCII encoded character. > Exists other utilties to convert pdf to a text file or extract text? > > I think it must possible, because I also can copy and paste text from > PDF documents. Have you tried it with this file ? This is a common complaint, you make think it is easy to extract the 'text' from a PDF file, but in fact it can be quite hard, and in some cases impossible. PDF is not intended as an editable format, while htere are several possible solutions to your problem, the easiest may simply be to print the file, scan it and then OCR it. Ken Ken Sharp wrote: > > PDF is not intended as an editable format, while htere are several > possib...

How to extract text from a PDF document
Hello, How can I extract text from a (MS Word) PDF file? I've tryed pdftotext but it only produce crap, not one readable cleartext sentence. :) Exists other (free) utilties to convert pdf to a text file or extract text? I think it must possible, because I also can copy and paste text from PDF documents. greetings Fabian Hello Fabian: You can try our product Chief-Win PDF Converter Personal Edition V1.1, convert PDF to word/text. You can download it through : http://www.chief-win.com/setup.exe, it allow 21 days free trial with full function. Or you can try Easy PDF To Text...

Parse pdf to extract text???????
Is there anyway to use php to parse a pdf file and extract text from the document? I have been looking around for a few days now and still really havent found much..... If anyone could help it would be greatly appreciated. Thanks, Nick On Nov 29, 5:46 pm, "Nicholas.B.Car...@gmail.com" <Nicholas.B.Car...@gmail.com> wrote: > Is there anyway to use php to parse a pdf file and extract text from > the document? I have been looking around for a few days now and still > really havent found much..... > > If anyone could help it would be greatly appreciated. > >...

extracting text from pdf files
Can anyone help me with how to extract text from pdf files using PHP or ColdFusion? Thanks for any help. Hi, Try the Xpdf project. Run the pdftotext command in the shell to produce the text. http://www.foolabs.com/xpdf/download.html There's more tips at php.net/pdf. runner7@fastmail.fm wrote: > Can anyone help me with how to extract text from pdf files using PHP or > ColdFusion? Thanks for any help. petersprc@gmail.com wrote: > Hi, > > Try the Xpdf project. Run the pdftotext command in the shell to produce > the text. > > http://www.foolabs.com/xpd...

Text from required text box to read-only text box
Hello, I am fairly new to JavaScript and its use in Acrobat Professional. My situation is this: I have a form with a text box field which is required for the user to enter his/her name. I would like the required text box to display the name in all caps. I also need the user's name to appear in a read-only text box later in the form, which I would like to have the first letter of the user's first, middle initial, and last names to be capitalized. I would also like to have all required fields on the form highlighted in yellow, but the highlighting not printed. Lastly, I would like the...

PDF Converter Pro
Anyone know how I can change the default font color and style for text boxes in PDF Converter Professional 3.0? Thanks. ...

extract text from mac pdf
i find that the text extracted from a pdf generated from pagemaker 6.5 (mac version) is monster characters. is there a way to do it? Thanks a lot. tony ...

Web resources about - ghostscript PDF page extraction, leaving text as text - comp.lang.postscript

Ghostscript - Wikipedia, the free encyclopedia
Ghostscript is a suite of software based on an interpreter for Adobe Systems ' PostScript and Portable Document Format (PDF) page description ...

Ghostscript 9.0 supports ICC profiles
... also supports ICC colour profiles and allows third-party Colour Management Modules (CMMs) to be integrated The developers have released Ghostscript ...

Bill Casselman's course page
PostScript is an interpreted language originally intended for use in printers.It can be used for many tasks involving complicatedgraphics, and ...

GhostPCL, GhostPDF, and GhostXPS
GhostPCL is Artifex Software's implementation of the PCL-5™ and PCL-XL™ family of page description languages. For more information please see ...

FileOptimizer can compress 33 different formats
... The program is essentially a front end for a host of other tools. Present it with a PDF file, say, and behind the scenes it’ll call up Ghostscript ...

The comet is here: Icaros 1.4 has been released!
We are really excited to announce the immediate availability of the new "point release" of Icaros Desktop, the most known distribution of the ...

Coders at Work: L Peter Deutsch
A prodigy, L Peter Deutsch started programming in the late ’50s, at age 11, when his father brought home a memo about the programming of design ...

Commands tagged mate - commandlinefu.com
Great UNIX/Bash commands tagged with mate - see these and many other invaluable command-line nuggets at commandlinefu.com

Cygwin Gold Stars
Cygwin Install Cygwin Update Cygwin Search Packages Licensing Terms Cygwin/X Community Reporting Problems Mailing Lists Newsgroups Gold Stars ...

World atlas of Flickr geotaggers is maptastic
The maps are ordered by the number of pictures taken in the central cluster of each one. This is a little unfair to aggressively polycentric ...

Resources last updated: 3/14/2016 3:41:59 AM