ghostscript PDF page extraction, leaving text as text
Ghostscript may be used to extract pages from a PDF file with a
command like this:
gs -sDEVICE=pdfwrite \
-dNOPAUSE -dBATCH -dSAFER \
-dFirstPage=48 -dLastPage=48 \
The problem is, while that page looks the same as the original in a
PDF reader, it seems to be an image rather than an "object"
representation. That is, open the extracted PDF in something like
Acrobat or PDF XChange Viewer and "search" and "text selection" work,
whereas in the extracted one neither function works. Presumably this
is because the text has been rasterized.
Is it possible to use gs to extract ranges of pages, preferably also
reducing the resolution of the embedded images, but leaving the text
as text? I frequently need to reduce the size of PDF files, but it
should all come out of the resolution of the images, and the text
should remain as accessible as it was in the original.
If ghostscript cannot do this, is there another linux tool that can?
>>>>> "David" =3D=3D David Mathog <firstname.lastname@example.org> writes:
David> gs -sDEVICE=3Dpdfwrite \ -dNOPAUSE -dBATCH -dSAFER \
David> -dFirstPage=3D48 -dLastPage=3D48 \ -sOutputFile=3Donepage.pdf
I've just tried this with a PDF file, and it works: search and select
works on both onepage.pdf and input.pdf.
David> The problem is, while that page looks the same as the
David> ori...pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF?
Will extract images work for you?
If so, PDF-Tools by Tracker Software will do it.
"MarosV" <email@example.com> wrote in message
> Is there a way to remove all text from PDF?
...ANN: Fly Text to PDF
Fly Text to PDF 1.3 is powerful tool which can convert your text files
into PDF. This tool is powerful converter tool running on Microsoft
Windows Operating System. You can use this tool to convert your text
report, text documents and other text files into PDF quickly and
easily. You also can set the PDF properties in each text files by using
special tags, or set the default properties for every output PDF files.
Please visit our website for more information:
For the output sample, please click on:
Key fea...Generating multiple pdf files with generated names using ods pdf.
Is it possible to generate MULTIPLE, individually named (on the fly)
pdf files using the ods pdf facility?
For example, an input file has data for Alfred, Betty, Charles, Debra
etc and I want a separate pdf file generated for each person with the
file names generated on the fly. For example, I would like to generate
files named as
so that the separate reports can be "distributed" individually. The
"names" above are not known in advance.
I know about the newfile option, but this gener...Convert postscript to PDF or text in OSX
I understand that you can convert postscript files to pdf using some
command line utility built into OSX.
(When i was reading about PStill
an Adobe Distill type of program for OSX $70. But the command is free.)
What command do i use?
% man pdf
I actually want to convert postscript files to the simple text format.
They are all text, columns and rows of names and numbers. The idea is
to get test scores with associated ID#s into a spreadsheet. The text
returned will have spaces instead of tabs so it will be clumb...Ghostscript and copying text from a pdf gets junk
I have downloaded gs 8.14 and redmon 1.7 to make pdfs in win98se.
I use the apple color laserwriter 12/600 driver which outputs
to redmon and makes a pdf.
The pdfs look great, the only problem is that when i try to copy
the text in acrobat and paste to notepad, it comes out like this:
GSview and text extract give similar results.
It doesn't always do this, that is, some pdfs don't have this problem,
but if i ask the printer driver to use truetype fonts only (no subst)
it seems to do it every time.
It appears that a "shifting" process is occurring:
copy text: 5HVXOWV
orig text: Results
s->V 3 character shift (not counting that it's lower case to upper case)
t->W 3 chars
e->H 3 chars
When the print driver uses "standard substitutions" like arial ->
helvetica it copies fine.
I dont think it's mozilla (which is the app i'm printing from)
...PDF PDF PDF
For anyone struggling to figure out how to create a PDF in SWX it's pretty
simple but you may have to have the Bluebeam version of swx
FIRST go to Tools>Options and check "Save as PDF"
Then you can save them right from the save as dialog.
Maybe I'm the only dumbass that could't figure that out! ;0) But it was in
help under "PDF"
An easier way is to download a program from www.pdf995.com that "prints" you
files to a PDF foramat. This program works with SW and any other program
you use to print with.
"3d" <jmiller at marvelindustri...Problems generating small PDF files in Ghostscript
I use Ghostscript for generating PDF files -- it seems to be one of
the best free general-purpose tools for distilling PS, and the
resulting PDF files are often smaller than those generated by other
programs. But there are a few hang-ups with generating the smallest
possible PDF files using Ghostscript:
1) There are "Tr" modes in the PDF specification for automatically
showing a text string, then stroking the path of the same text string
in the same position, but Ghostscript doesn't generate such PDF code --
instead "(XX) dup gsave show grestore false charpath stroke&quo...PDF file generated from Ghostscript and Acrobat Distiller
I have 2 PDF files generated from a same Microsoft Word DOC file (2 pages of
my resume) from two computers. One computer generated PDF file by
Ghostscript 8.53, and the other computer generated by Acrobat Distiller. I
notice the PDF file generated by Ghostscript is 13KB, and generated by
Acrobat Distiller is 130KB. I was wondering that why the file size is so big
difference? Any Ideas?
Thanks in advance.
In article <%70ag.2338$a23.79@trndny01>, firstname.lastname@example.org
> I have 2 PDF files generated from a same Microsoft Word DOC file (2 pages of
> my resume) from two computers. One computer generated PDF file by
> Ghostscript 8.53, and the other computer generated by Acrobat Distiller. I
> notice the PDF file generated by Ghostscript is 13KB, and generated by
> Acrobat Distiller is 130KB. I was wondering that why the file size is so big
> difference? Any Ideas?
Embedded fonts, different compression settings, differnet image
compression/downsampling (if images are present) are the most likely.
Acrobat 7 (professional) can do 'audit space usage'
These PDF files do not have embedded image. For the smaller PDF file, does
that mean this PDF file has higher compression setting, or this file has
less embedded components (such as fonts) in it? If it has less components,
will this smaller PDF file display content correctly at any other computer?
"Ken Sharp&q...How to generate correct postscript from pdf on a SPARC Solaris
Users have begun reporting they are receiving PDFs created on Macs and
Windows using Adobe Acrobat 6.x. Our platform is SPARC Solaris 8, and
Adobe only has made Acrobat reader 5.0.8 available for download.
The problem is that the users are reporting certain special characters are
being printed differently - in one case, a less than or equal sign is
being printed as a greater than or equal to sign!
I just checked adobe.com and no newer version of acrobat reader is
available for Solaris (or, for that matter, Linux).
What legal alternatives do we have on Solaris for generating correct
printed...Text from PDF files generated from a LaTeX source
I'm dealing with trying to get a company that sells a file parsing
system to schools and universities to combat plagiarism. They claim
that they can handle PDF files, but all the versions I have tried so far
(from a LaTeX source, both by way of pdflatex and dvips -Ppdf ;
ps2pdf14) is not handled properly by their system (they are slow to
respond to questions, so I have had mucho frustration getting them to
actually test files for me, all my tests have been using reports mailed
to my professsors as tests :-).
What I have getten from them is that they need to extract the text from
the fil...Generating a PDF with page dimensions only as large as the text demands
I'm interested in using pdflatex to generate good-looking equations
that I will paste into PowerPoint slides. (Yes, yes, rotten tomatoes
coming my way...) What I would like to happen is for the generated
PDF to have the dimensions demanded by the equation contained in the
document. Is there any way to do this? I have tried generating a
page-size PDF with pdflatex and then converting with ImageMagick with
convert -trim file.pdf trimmed_file.pdf
but trimmed_file.pdf no longer looks good -- the background isn't
transparent, and the text looks bad upon rescaling.
Any suggestions would be much appreciated!
On Oct 19, 5:12=A0pm, "sinos...@gmail.com" <sinos...@gmail.com> wrote:
> I'm interested in using pdflatex to generate good-looking equations
> that I will paste into PowerPoint slides. =A0(Yes, yes, rotten tomatoes
> coming my way...) =A0What I would like to happen is for the generated
> PDF to have the dimensions demanded by the equation contained in the
> document. =A0Is there any way to do this? =A0I have tried generating a
> page-size PDF with pdflatex and then converting with ImageMagick with
> the command
> =A0 convert -trim file.pdf trimmed_file.pdf
> but trimmed_file.pdf no longer looks good -- the background isn't
> transparent, and the text looks bad upon rescaling.
> Any suggestions would be much appreciated!
Ah, I've managed to answer my own questio...problem converting the postscript file to pdf using ghostscript
i have a post script file which has four pagees
three pages are in Potrait and the fourth is landscape
when i run command this command "gswin32c -q -dLOCALFONTS -dSAFER
-dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dAutoRotatePages=/None
-sOutputFile=c:\S100834431_GScript_6.pdf -dCompatibilityLevel=1.4 -c
..setpdfwrite -f C:\S100834431.ps"
it generates a pdf file i get the first three pages correct but the
fourth page which was landscape is not displayed properly
pls help me
...Q: Latex special to make pdf text unextractable
Can somebody advise on how to block text extraction from pdf's generated
through latex -> dvips -> ps2pdf (a \special probably)?
Among other things, VK@club.org saw fit to write:
> Can somebody advise on how to block text extraction from pdf's generated
> through latex -> dvips -> ps2pdf (a \special probably)?
I don't know, but you could try using an external app *after* generating the
pdf, something like pdftk: http://www.accesspdf.com/pdftk/
Ignacio __ Fern�ndez Galv�n
Linux user / / \
#289967 / / /\ \ PGP Pub Key
/ / /\ \ \ 0x01A95F99
/ /_/__\ \ \
/________\ \ \
jellby \___________\/ yahoo.com
On 29-11-2004 20:22, VK@club.org wrote:
> Can somebody advise on how to block text extraction from pdf's generated
> through latex -> dvips -> ps2pdf (a \special probably)?
For that purpose (among others), I use Multivalent:
Jose Carlos Santos
...How to make ASCII text of scanned PDF image files searchable by Google
I am a professional writer, and have a number of published booklets,
brochures, etc. scanned into Adobe Acrobat PDF format as images,
including all of the original printed artwork. They are linked to
from the resume and portfolio page of my business website at:
The problem is that while Goolge and other search engines now index
the contents of PDF files for web searches, these pages and their text
are not searchable. Is there an easy way I can make the text of these
files searchable? I could easy re-scan each of the documents with my
OCR software to g...ADV: EventStudio 2.0: Generate PDF Documents from Text Files
We are pleased to announce the release of EventStudio 2.0, This
is a major upgrade to the product.
What's New in EventStudio 2.0:
o Four new type of documents can be generated:
- Collaboration Diagram (PDF)
- Interface Collaboration Diagram (PDF)
- Interaction Collaboration Diagram (PDF)
- Message Filter Collaboration Diagram (PDF)
o Choose between collaboration diagram or context
diagrams for any of the above document types.
o Use the power of regular expressions to generate
an infinite variety of documents...Re: How do I make the text bigger using "ODS PDF" in this case?
You need use proc template to create your style with bigger
filename ascii temp;
proc printto print=ascii;
proc print data=sashelp.class;
proc printto print=print;
edit styles.default as ymini;
style fonts /
ods pdf file="c:\temp\junk.pdf" style=ymini;
put @1 _infile_ $char70.;
ods pdf close;
On Wed, 29 Apr 2009 02:40:34 -0700, RolandRB <rolandberry@HOTMAIL.COM>
&g...Problems converting PDF to PDF/A with ghostscript
I am running Ghostscript 8.71 on Windows. I have tried just about
every combination of switches and ICC profiles. The command that seems
to work the best is as follows:
gswin32 -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -
sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -
This yields a seemingly valid PDF/A-1b (CMYK) for most files. I am
able to open the files in acrobat and successfully validate the
conformity of the document to the aforementioned PDF/A standard.
Certain files do not pass the PDF/A conformity validation in Acrobat,
despite the note in Acrobat stating the file is being viewed in PDF/A
mode. The preflight test reports the following error:
CIDset in subset font is incomplete
ArialBoldItalicMT 14.112 pt TrueType (CID) embedded (as a subset)
If I convert the offensive input_pdf.pdf file using Acrobat by
printing to the Adobe PDF printer and selecting the PDF/A-1b (CMYK)
standard the output PDF passes the preflight without any errors.
Does anyone have any suggestions for consistently producing valid PDF/
A-1b (CMYK) documents with Ghostscript?
In article <6cfcdbba-47de-4be1-8e93-
> I am running Ghostscript 8.71 on Windows. I have tried just about
> every combination of switches and ICC profiles. The command that seems
> to work the best is as follows:
> gswin32 -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSA...make text flow around other text ?
This text starts +---------------+
on the left but | |
then when the box | some boxed |
ends we see that | text |
the text on the | |
left is allowed +---------------+
to flow across all the way to the
right margin. What HTML tags convey
that this flowing should occur ?
Note: The boxed text is preferrably
as text and not as an image.
>From: email@example.com (Eric=A0Osman)
>What HTML tags convey
>that this flowing should occur ?
A simple Tables Layout can accomplish that!
Web...PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to
have it read for him. He has a PC. Found PDF reader $50 ,
http://thurly.net/11ia and http://thurly.net/11i4 the last being google.
Wondering what you folks found useful or use ?
Bill S. Jersey USA zone 5 shade garden
http://uppitywis.org/ live WI
...generating text in unbound text books
I want some unbound text boxes in my form which lookup information about a
student (such as their address, etc.) from a "Students" table when the
Student's ID # is entered in this "Student Programs" table that I have. Say
one such text box is labeled "Add1" and is Unbound.
Why doesn't this piece of code, put in the afterupdate portion of Student
ID, give me the Address of the Student? (it instead gives me a blank form)
Private Sub Student_ID_AfterUpdate()
Me.Add1 = DLookup("[Home address 1]", "Students", "[Student ID] = &quo...Making 2 lines of text in text()
I would like to add some text in a plot using text(). I am wondering
what symbol should I add in the text so that it shows 2 lines of text
in the textbox?
Use the "\newline" tag for TeX/LaTeX
t = text(1,1,'my\newlinetext');
"Alan" <firstname.lastname@example.org> wrote in message
> I would like to add some text in a plot using text(). I am wondering
> what symbol should I add in the text so that it shows 2 lines of text
> in the textbox?
>...Need a tool which makes a Text-PDF and a separate TXT-File from a windows print stream
I know some tools which make Text-PDFs and other tools which make
Picture-PDFs and separate TXT-Files with a bad structure (values of
tables are as long strings without spaces in the textfile, so it is not
possible to extract the values) from a windows print stream (virtual
Is there a tool which can both things or is it necessary to take two
different tools (the first tool make a Text-PDF and the second tool
makes a textfile from the PDF).
Can you capture the Windows print stream as Windows Metafiles (WMF,
If so, our wmf2vector SDK migh...Re: How do I make the text bigger using "ODS PDF" in this case? #2
Sorry, there is a '(' missing from this line:
On Wed, 29 Apr 2009 19:25:54 -0400, Ya Huang <ya.huang@AMYLIN.COM> wrote:
>You need use proc template to create your style with bigger
>filename ascii temp;
>proc printto print=ascii;
>proc print data=sashelp.class;
>proc printto print=print;
> edit styles.default as ymini;
> style fonts /
> 'BatchFixedFont' ="Courier",16pt);