f

#### PDF::API2 - Extracting text and position from PDF file

Hello All,

I am new to PDF files so I don't really know if what I want to do is
possible and how to use the PDF::API2 modules.

I need to extract information from columns in a table ( I assume that
PDF does not know anything about tables). What I was thinking of doing
was finding the horizontal location of the header (I know what it should
be), then extract all text that starts at that location.

I have played around with the PDF::API2 module and read the 'Using
PDF::API2 - The code' help page, however it doesn't show me how to
extract information from an existing file. Could someone point me in the
right direction for some documentation or examples of how this might be
done, or if it can be done?

Brian

 0
4/17/2008 1:20:05 PM
comp.lang.perl.modules 4194 articles. 0 followers. jerrykrinock (6) is leader.

0 Replies
717 Views

Similar Articles

[PageSpeed] 13

Similar Artilces:

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

A PDF into a FM file then save as pdf... how to have alll the pages of the included pdf file
FM 7.2 We can import a complete PDF file as an object into a framemaker file. Then when we try to generate a pdf from this framemaker file, we have only the firts page of the pdf imported. Is there a method to import a pdf file and force FM to generate a pdf file with save as... with the complete pdf file inserted into the padf ? > We can import a complete PDF file as an object into a framemaker file. Then > when we try to generate a pdf from this framemaker file, we have only the > firts page of the pdf imported. > > Is there a method to import a pdf file and force FM to generate a pdf file > with save as... with the complete pdf file inserted into the padf ? FrameMaker imports PDF files per page only. To do something similar to what you intend, I once came up with a Windows shell script that writes out a MIF file, incorporating the name of a referenced PDF file and its page count. Note that the MIF code was just stripped off a file and isn't constructed very nicely, but it does the job. After running the script, you should use FrameMaker to open and save the output file. There is no text flow, thus you can't import the file into a document, but you can use it as part of a book instead. Regards Johannes Here is the VB script, feel free to post an optimized version: Dim fso, mif, pdfname, pagecount, counter Set fso = CreateObject("Scripting.FileSystemObject") pdfname = InputBox("Name o...

Perl module PDF::API2
Hi all, I'm trying to generate a PDF index file for CDs with thumb nail images and image titles. I searched CPAN and find out PDF::API2 is the right module to use. However, there isn't much documentation or examples to help me to understand how it works. I read through old topics posted to this group but haven't got any clue how to start with images. Has anybody done something similar with PDF::API2? Could anyone help me to get started? Any comments will be highly appreciated. Thank you Mei Hi Mei, I've been trying to learn PDF::API2 as well. I just bought a book(Perl Graphics Programming). There is a short chapter on PDF::API2 with simple examples. You could actually get those examples online. I will try to find the link for you. Please let me know if you've got any solutions on creating those PDF files. Lisa hu_mei@hotmail.com (mei) wrote in message news:<243028f6.0407110335.1c02eb60@posting.google.com>... > Hi all, > > I'm trying to generate a PDF index file for CDs with thumb nail images > and image titles. > > I searched CPAN and find out PDF::API2 is the right module to use. > However, there isn't much documentation or examples to help me to > understand how it works. > > I read through old topics posted to this group but haven't got any > clue how to start with images. > > Has anybody done something similar with PDF::API2? Could anyone help > me to get started? > > An...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

PDF TO PDF/A
Hello, is it possible to convert a PDF file to PDFA file ? i tried the -dPDFA option and i vefy my pdf file with PDF longlife. I have an error. Thanks for help. ...

I need help with PDF::API2 to make a PDF file navigation aide
I have used this, along with a couple other of the PDF modules, to create what are now rather large PDF files. They are a bit tedious to scroll through, so what I want to do now is create something like a table of contents that is always displayed in a narrow strip along the left margin that allows the reader to simply select an item in that window and have the corresponding page appear in the main window. But unlike a conventional table of contents, it would never appear at the beginning of the document in the main window (and unlike an index, it would never appear at the end of the documen...

Recognize text from certain position in PDF files
I have a large PDF file consisting of hundreds of single-paged letters. These letter were originally created in Microsoft word. The word file is gone. I have an index code on each letter that I need to extract. It is in a fixed position on each letter.... for example, at 2 inches down and 3 inches across, for a specific width. Unfortunately the index code is just a number and there is no prefix that makes it easy to extract from a text file. Is their a utility to run through the PDF and extract the information in this one section of each page? Thanks, Crop the pages to tha minimum neede...

Batch converting of PDF files into searchable PDF files
Hello Group We need to convert about 500 multipage non-searchable PDF files into searchable PDF files. The original PDFs just contain the scanned documents and we plan to run them through an OCR program which will afterwards save them back in the PDF format. It seems that Omnipage 12 Pro Office edition would be able to do the job but Omnipage 12 Standard, which is much less expensive, seems to have many of the same features. Unfortunately I wasn't able to find a resource which lists the exact differences between the two programs. For now we'd require only one or two licences...

New font added to the PDF document(created with Perl PDF::API2) is not visible on MAC
Hi, I am creating a PDF document using PDF::API2 module of Perl. When I tired to add the TrajanPro-Bold font in my document, I am getting the following error, Use of uninitialized value in numeric gt (>) at /usr/local/share/perl/ 5.10.0/PDF/API2/Basic/TTF/Cmap.pm line 258. Steps followed : my $font_dir = '/usr/share/fonts/opentype'; my$pdf = PDF::API2->new( -file => $args[0] );$pdf->mediabox('Letter'); PDF::API2::addFontDirs($font_dir); my$f3 = $pdf->ttfont('TrajanPro-Bold.otf', -encode=>'latin1'); The .otf file is present in the local d... How print a pdf file on both sides of a paper(Duplexing for pdf files) Hi All, Please let me know how to print a pdf file on both sides of a papers. Thanks in advance. krishnapalvadi@gmail.com wrote: > Hi All, > Please let me know how to print a pdf file on both sides of a > papers. > Thanks in advance. > Do you have a duplex printer? krishnapalvadi@gmail.com wrote: > Hi All, > Please let me know how to print a pdf file on both sides of a > papers. > Thanks in advance. The same way that you would print any other document on both sides of the paper. You select duplex output in your printer driver properties. Presuma... split one pdf file into multiple pdf files #2 Hi I am in urgent need of code for spliting one pdf file into multiple pdf files based on a particular condition. Guidance of how to go about writing the code for this problem would really helpful. Thanks, Suparana ... GUIDE: Using xtopdf to create PDF from text and DBF files (including creating simple PDF e-books) Hi, I'm giving below, steps to install and use my xtopdf PDF creation/conversion toolkit. This post is for end-users. xtopdf is both a set of end-user tools and a library for use by developers, to create PDF from various input formats. I'll post another message here about how developers can use it in various ways, sometime later. The steps are for the Windows platform. Will do another post for Linux. 1. Get Python v2.4.3 here: http://www.python.org/ftp/python/2.4.3/python-2.4.3.msi Size is not more than 10 MB. Install it - its an MSI, so just double-click. (Any Python version >... How to create two pdf files into one pdf file using pdflatex? Hi, there! I use MikTeX on my PC. I have a tex file, I use pdflatex to create a pdf file. Now my problem is that I have other a pdf file, I want to put two pdf into a pdf file. Does anyone have suggestion to do this? I am like to put some tex code into my tex file, then I run pdflatex, I will create this two pdf files into one pdf file. Thanks! zyx wrote: > Hi, there! > > I use MikTeX on my PC. I have a tex file, I use pdflatex to create a > pdf file. Now my problem is that I have other a pdf file, I want to put > two pdf into a pdf file. Does anyone have suggestion to do this? I am > like to put some tex code into my tex file, then I run pdflatex, I will > create this two pdf files into one pdf file. > Thanks! use the package: pdfpages (remember to read the manual for it) it can do all sorts of nifty things with pdf files -- /daleif (remove RTFSIGNATURE from email address) LaTeX FAQ: http://www.tex.ac.uk/faq AMSMATH Intro: http://www.ams.org/tex/amslatex.html LaTeX Intro: http://people.ee.ethz.ch/~oetiker/lshort/lshort.pdf Graphics Intro: http://www.ctan.org/tex-archive/info/epslatex.pdf Superb Class: http://www.ctan.org/tex-archive/help/Catalogue/entries/memoir.html Remember to post minimal working examples. Dear Madsen; Thank you very much for your answer. It is very useful. However, I want to create a bookmark at first pdf file, I use following lists into my tex file: *****************************************... text on pdf file How can I add a text on a pdf file in order to print it with the page? Thanks Sandro "Sandrobi" <sandbizzXXX@libero.it> wrote in message news:<TQP0c.42052$Kc3.1337410@twister2.libero.it>... > How can I add a text on a pdf file in order to print it with the page? > > Thanks > > Sandro There are so many ways to do it if you have the full version of Acrobat. Not the Free Reader. 1. Use the free text tool. 2. Use the forms tool and make a text box and fill it up. 3. Use the touchup text tool, click where you want the text inserted while holding the &...

perl
I have just tried some online examples, and nothing seems to work. I'm using PDF::API2-0.44 I tried this example script: http://www.as220.org/shawn/PGP/examples/example12-5.txt And I get an invalid output file; I think (hard to trace) the xref offsets are wrong, so the object parser is being dropped into the middle of a binary stream. Anyone using API2 with success to modify PDF? BugBear ...

How to convert eps to pdf with pdf file size same as the eps file BBox? (using ps2pdf)
I want to convert an eps file to a pdf file and the paper size of the newly created pdf file be the same as the BBox of the eps file. Would you please show me what options I should specify if I use ps2pdf? Or you can tell me some other ways. Thanks, Peng > I want to convert an eps file to a pdf file and the paper size of the > newly created pdf file be the same as the BBox of the eps file. Would > you please show me what options I should specify if I use ps2pdf? Or > you can tell me some other ways. This will do it: -dEPSCrop Govert Govert J. Knopper wrote: >> I want...

insert pages from auxiliary pdf file into specific page locations in main pdf file
I am saddened by the slow death of usenet. Sigh. It is tax season in my neck of the woods. Depending on what tools you use,= you might end up with forms that don't make a duplex printer-friendly PDF = file. Since my home printer is "manually duplexed", I would like to write = a bash script to insert the pages of auxiliary files into a main PDF file. = Make a huge file that can print out easily with minimal manual interventio= n other than one flip of the whole file once the odd sides have printed out= .. Oh, and I'm looking for a free open source solution. ...

How to convert this ps file to a pdf file w/ searchable text? (pkfix-helper: No Type 3 fonts were encountered in the input file)
I try to use pkfix-helper to fix the pdf and then run ps2pdf on it. But I get the following error. Could you please let me know if there is any other way to generate a searchable pdf file? $pkfix-helper 0375.ps Reading 0375.ps ... done. Number of Type 3 fonts encountered: 0 pkfix-helper: No Type 3 fonts were encountered in the input file On May 13, 10:45=A0pm, Peng Yu <pengyu...@gmail.com> wrote: > I try to use pkfix-helper to fix the pdf and then run ps2pdf on it. > But I get the following error. Could you please let me know if there > is any other way to generate a searchable pdf file? > >$pkfix-helper 0375.ps > Reading 0375.ps ... done. > Number of Type 3 fonts encountered: 0 > pkfix-helper: No Type 3 fonts were encountered in the input file That looks more like a "nothing to do" message than an "error" per se. If we could see some of the postscript, ... On May 14, 12:41=A0am, luser- -droog <mijo...@yahoo.com> wrote: > On May 13, 10:45=A0pm, Peng =A0Yu <pengyu...@gmail.com> wrote: > > > I try to use pkfix-helper to fix the pdf and then run ps2pdf on it. > > But I get the following error. Could you please let me know if there > > is any other way to generate a searchable pdf file? > > > $pkfix-helper 0375.ps > > Reading 0375.ps ... done. > > Number of Type 3 fonts encountered: 0 > > pkfix-helper: No Type 3 fonts were encountered in the input file > > ... Encapsulated PDF Storage and Encrypted PDF file Hi there, There is no such DICOM attribute as 'Encrypted PDF' ? This attribute would be used to identify whether the DICOM file encapsulate an encrypted PDF file or not... Just a random thought, -Mathieu ... [hyperref] Different behaviors of \href{<local pdf file>}{...} and \url{<local pdf file>} Hi, I noticed different behaviors of \href{<local pdf file>}{...} and \url{<local pdf file>}, for instance with the following MCE: \documentclass{minimal} \usepackage{lmodern} \usepackage{hyperref} \hypersetup{pdfnewwindow} % This doesn't change anything \begin{document} \href{local.pdf}{Clic here to open local pdf} \url{local.pdf} \end{document} The first link opens the local.pdf file with the same PDF viewer (and, if pdfnewwindow is active, in a new window only with some viewers: with Acrabat and Evince but not with okular and xpdf). The second link opens the local.pdf file with the default PDF viewer (maybe the one specified in Firefox settings). Is there a way to make \href behave as \url for this purpose? Thanks in anticipation. -- Denis ... Copy from a PDF to PDF Is it possible to copy a piece of text from a page (not the whole page) from a PDF file and paste it into another PDF file in a desired position? Thanks in article YT2Fb.22892$vi2.19893@twister.auna.com, jesusp at jesusp@arrakis.es wrote on 12/20/03 1:09 PM: > Is it possible to copy a piece of text from a page (not the whole page) from > a PDF file and paste it into another PDF file in a desired position? - Yep - copy and paste. You may have to crop it first to get just what you want. And it may not paste into position - you need to move it around. - I suppose it depends on which applications you have at hand. Acrobat will certainly do it. MSD Thx, I crop the desired rectangle in the first PDF, select it with the TouchUp Object Tool, then copy it, but the WHOLE page is pasted in the second PDF and I'd like to have only the part I want. At least this is what I get with ACROBAT 5.0. Am I doing something wrong? "WharfRat" <wharfrat@footprintsphotographics.com> wrote in message news:BC09FB34.10D43%wharfrat@footprintsphotographics.com... in article YT2Fb.22892$vi2.19893@twister.auna.com, jesusp at jesusp@arrakis.es wrote on 12/20/03 1:09 PM: > Is it possible to copy a piece of text from a page (not the whole page) from > a PDF file and paste it into another PDF file in a desired position? - Yep - copy and paste. You may have to crop it first to get just what you want. And it may not paste into position - you need to move it around. - I... PDF file defeats 2 out of 3 PDF viewers. Hello all, I have received a pdf file which RiScript (v5.02) and GView (v1.51) won't open, but Colin Granville's !PDF will. RiScript gives an error message from "PDFront": "GhostScriptError: undefined; OffendingCommand: filter" . GView gives a GhostScript output window with content starting with "Error: /undefined in --filter-- Operand stack: " continuing with lines "-nostringval-- ........" "Dictionary stack: ............." and ending "Current allocation mode is local AFPL Ghostscript 7.03: Unrecoverable error, exit code 1" When viewed with !PDF it looks as if the pdf was produced from a scanned paper document as many letters appear doubled. Clicking on "no drawings" in the display menu sometimes gives a sharper view, but sometimes it seems to make no difference. When !PDF gives a sharper image, the GBP sign comes out as E and a forward slash becomes a 1. It is a renewal of subscription form and I need to print it out to send it off. Just to complicate matters, the first page is portrait and the second is landscape (both A4). Does anyone have any ideas on this? Cheers, Dave Lane -- At last, Micro$oft advertises the truth: Micro$oft monster software changes office workers into dim-witted monsters. David R Lane <D_Lane@lakeview.demon.co.uk> wrote: > When viewed with !PDF it looks as if the pdf was produced from a > scanned paper document as many letters appear doubled... Re: ods pdf...adding pages to a pdf file William, You can use the following to place several procedure output documents into one document: ods pdf file='c:\whatever.pdf' proc whatever; run; proc whatever; run; proc whatever; run; ods pdf close; As far as the table of contents is concerned, see the following URL: http://support.sas.com/publishing/pdf/58087_pg80.pdf Sincerely yours, Mark J. Lamias -----Original Message----- From: William Kossack [mailto:kossackw@NJC.ORG] Sent: Thursday, August 28, 2003 5:27 PM To: SAS-L@LISTSERV.UGA.EDU Subject: ods pdf...adding pages to a pdf file I have a program that produces a ... pdf::writer Sorry to ask here, I could not figure it out from Google chasing. Can it do other operations on PDF files e.g. delete pages, insert pages, etc.? Thanks. On 11/2/05, itsme213 <itsme213@hotmail.com> wrote: > Sorry to ask here, I could not figure it out from Google chasing. > > Can it do other operations on PDF files e.g. delete pages, insert pages, > etc.? No. Sometime next year. -austin -- Austin Ziegler * halostatue@gmail.com * Alternate: austin@halostatue.ca ... Determining if a PDF file has text I need to implement what sounds like an easy requirement but is turning out not to be not so easy. The users will upload PDF files through our website and we will store them in Oracle for indexing and text searching. The requirement is to determine if the document has text (i.e. that it isn't just a scanned image file) prior to storing it. Can I do something simple like scan the document for /Font? Is there anything that is always present in a PDF file with text? This is not a Reader app in any way and it will run on a server, so the SDK seems to be out of the question. Word count wou... Web resources about - PDF::API2 - Extracting text and position from PDF file - comp.lang.perl.modules Local Measure addresses the importance of extracting value from Big Data There are many opportunities surrounding the Big Data space and if businesses milk the value from all the data they’ve got, it can change the ... Dentist deregistered after extracting teeth and performing root canal work without patients’ consent A DOWN-in-the-mouth dentist who extracted wisdom teeth and performed root canal work on patients without their consent has been deregistered ... Extracting the digital Alexa Moses tries to make sense of the Resfest digital film festival. - Sydney Morning Herald Online Shy bidders' tactics make extracting bids almost as painful as dental work Shy bidders' tactics make extracting bids almost as painful as dental work Brookfield 'ruthless' in extracting infrastructure profits, says CBH The nation's biggest grain exporter CBH Group has thrown its weight behind concerns by farmers over Brookfield's$8.9 billion proposed takeover ...

Several groups are calling on the province to tighten groundwater laws as B.C. is the only jurisdiction in Canada that does not charge major ...

Extracting info: Why’s it so hard to get the goods on oil spills?
It can be tough to get a good sense of how safe Alberta’s oil industry is when you can’t access the information you need.

Krugman on extracting a price for intellectual dishonesty
Just a small point, but with it I want to make a larger one. ■ The small point is about Paul Krugman and his slow path to calling out his professional ...

When Oil Isn’t Worth Extracting
... the point at which energy resources become too costly to take out of the earth: We’ll never run out of any fossil fuel, in the sense of extracting ...

Revenge is ours: extracting energy from a cockroach
... inside it to use it as a mini-electricity generator. Now, I hate cockroaches as much as anyone, and there is a certain satisfaction in extracting ...

Resources last updated: 3/24/2016 7:27:30 AM