f



Converting a PDF document image to text

I prgramatically save a PDF document as text using automation.  I then
process the document in VB.

However, my client now has one of his clients sending an image of a
document in a PDF file.

What is the best way to convert the PDF image file to text using VB?

Thanks.

Greg
0
3/27/2007 3:25:59 PM
comp.lang.basic.visual.misc 10153 articles. 0 followers. Post Follow

5 Replies
611 Views

Similar Articles

[PageSpeed] 40

In article <jldi03dlmqcb5mr1fifrg5ok89mso45rok@4ax.com>, bookreader_1955
@yahoo.com says...
> I prgramatically save a PDF document as text using automation.  I then
> process the document in VB.
> 
> However, my client now has one of his clients sending an image of a
> document in a PDF file.
> 
> What is the best way to convert the PDF image file to text using VB?
> 
> Thanks.

AFAIK, your only choice will be to save it as an image file and use OCR 
on the file.

-- 
Remove the ns_ from if replying by e-mail (but keep posts in the 
newsgroups if possible).
0
ns_dkerber (39)
3/27/2007 3:36:55 PM
If a PDF file is in ASCII format (As opposed to Binary) it should be 
possible to extract the text.

Postscript is a well defined language and parsing the page is a possibility.

I don't think the task is simple but it should be possible.


CharlesW


"Greg" <bookreader_1955@yahoo.com> wrote in message 
news:jldi03dlmqcb5mr1fifrg5ok89mso45rok@4ax.com...
>I prgramatically save a PDF document as text using automation.  I then
> process the document in VB.
>
> However, my client now has one of his clients sending an image of a
> document in a PDF file.
>
> What is the best way to convert the PDF image file to text using VB?
>
> Thanks.
>
> Greg 


0
charles8756 (164)
3/27/2007 3:58:25 PM
"charles@home.com" <Charles@home.com> wrote in message
news:46093f03$0$8747$ed2619ec@ptn-nntp-reader02.plus.net 
> If a PDF file is in ASCII format (As opposed to Binary) it should be
> possible to extract the text.
> 
> Postscript is a well defined language and parsing the page is a
> possibility. 

Postscript <> PDF

-- 
Reply to the group so all can participate
VB.Net: "Fool me once..."

0
3/27/2007 4:14:11 PM
>> If a PDF file is in ASCII format (As opposed to Binary) it should be
>> possible to extract the text.
>>
>> Postscript is a well defined language and parsing the page is a
>> possibility.
>
> Postscript <> PDF

True, but PDF uses Postscript internally...  see the "Technology" section 
here

http://en.wikipedia.org/wiki/Portable_Document_Format

And the text can be stored internally in a PDF... as text... or, of course, 
as an image of text.

Rick 


0
3/27/2007 4:37:25 PM
Rick

If the text is in the form of an image then "OCR" might be considered
although I have never looked at using that technology in VB it is
nowadays considered to be well established and widely used.


CharlesW




"Rick Rothstein (MVP - VB)" <rickNOSPAMnews@NOSPAMcomcast.net> wrote in 
message news:Ogwb04IcHHA.2088@TK2MSFTNGP05.phx.gbl...
>>> If a PDF file is in ASCII format (As opposed to Binary) it should be
>>> possible to extract the text.
>>>
>>> Postscript is a well defined language and parsing the page is a
>>> possibility.
>>
>> Postscript <> PDF
>
> True, but PDF uses Postscript internally...  see the "Technology" section 
> here
>
> http://en.wikipedia.org/wiki/Portable_Document_Format
>
> And the text can be stored internally in a PDF... as text... or, of 
> course, as an image of text.
>
> Rick
> 


0
charles8756 (164)
3/29/2007 10:06:07 AM
Reply:

Similar Artilces:

VeryPDF PDF To Image Converter v2.0
For Immediate Release Contact: support@verypdf.com http://www.verypdf.com/ http://www.verypdf.com/pdf2tif/index.htm VeryPDF PDF To Image Converter v2.0 - Convert Adobe PDF file to Image Formats PDF To Image Converter Introduction (Document) PDF To Image Converter is an application program based on Windows platform, which can directly convert PDF files to dozens of image formats, such as TIF, TIFF, JPG, GIF, PNG, BMP, EMF, PCX, TGA and so on, it does support whole PDF file to image file conversion, include text, line, arc, ellipse, Bezier, color, image, form and other type elements. PD...

ANN: Fly Text to PDF
Hi All: Fly Text to PDF 1.3 is powerful tool which can convert your text files into PDF. This tool is powerful converter tool running on Microsoft Windows Operating System. You can use this tool to convert your text report, text documents and other text files into PDF quickly and easily. You also can set the PDF properties in each text files by using special tags, or set the default properties for every output PDF files. Please visit our website for more information: http://www.medafan.com/pdf-tools For the output sample, please click on: http://www.medafan.com/pdf-tools/license.pdf Key fea...

Printing PDF documents from Visual BASIC
Is there any way I can control the printing of PDF documents from within Visual BASIC 6.0? I need to be able to control which network printers the documents are printed on and I also need to specify which trays are used for printing. I know Adobe supply various SDKs, but do any of them control the printing of PDFs in this way? Thanks in advance David ------------------------------------- There's a component called XpdfPrint from Glyph and Cog that allows you to do this. It's a COM component (.NET wrapper available too) that allows you to: * Query connected printers & bins *...

PDF Converter Mac
PDF Converter for Mac can help you: Convert PDF to Word Document (*.doc); Convert PDF to Excel spreadsheets (*.xls); Convert PDF to PowerPoint presentations (*.pptx); Convert PDF to EPUB eBooks (*.epub); Convert PDF to Text files (*.txt); Convert PDF to HTML pages (*.html); Print restricted PDF files; Resize images of PDF files; Edit PDF files in Microsoft Office; Share PDF files on internet easily; Change contents in any existing PDF files; Modify typos and misspellings in PDF files; Calculate and edit data in Microsoft Excel; Read PDF eBooks on mobile devices, such as iPad, iPhone, iPod touch, Sony reader and more; Extract the text, images, graphics, tables and hyperlinks for reusing in other applications. More Info: http://www.pdf-converter-mac.net/ PDF to Word Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-word/ PDF to Word Converter for Mac is specially designed for Mac users, which helps to convert PDF to Word on Mac OS X. PDF to PPT Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-ppt/ PDF to PPT Converter for Mac empowers Mac users to convert PDF to Microsoft PowerPoint accurately. PDF to Excel Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-excel/ PDF to Excel Converter for Mac extracts the data in the PDF file to Excel spreadsheets for easy editing and calculating. PDF to ePub Converter for Mac: http://www.pdf-converter-mac.net/pdf-to-epub/ PDF to EPUB Converter for Mac is an efficient PDF to E...

PDF image of text to readable text ?
Seems there are web based tools and software. My son needs text to have it read for him. He has a PC. Found PDF reader $50 , http://thurly.net/11ia and http://thurly.net/11i4 the last being google. Wondering what you folks found useful or use ? Thanks! -- Bill S. Jersey USA zone 5 shade garden http://uppitywis.org/ live WI ...

Scanned text (image) needs to be converted to text
Hello all, I have a PDF of a contract that was scanned in and stored as an image (so we can't select the text). Is there any way to convert an image to text (preferably built in to Adobe Acrobat or a free plug-in)? We are using Adobe Acrobat 5.0. Thanks for any help anyone can provide, Conan Kelly Conan Kelly wrote: > Hello all, > > I have a PDF of a contract that was scanned in and stored as an image > (so we can't select the text). > > Is there any way to convert an image to text (preferably built in to > Adobe Acrobat or a free plug-in)? > &...

Is it possible to convert text document into pict or bmp document ?
Hello All !!! I have and old Performa Mac 520. Pict document has been converted into text document by error. How to convert again in Pict or Bmp document ? Is it possible? I just have a flopppy disk text document; original has been delete by the ancient owner who give his old machine to me... Regards, kat On 9/8/04 4:05 AM, Kathleen uttered: > Hello All !!! > > I have and old Performa Mac 520. Pict document has been converted into text > document by error. How to convert again in Pict or Bmp document ? Is it > possible? I just have a flopppy disk text document; original has been delete > by the ancient owner who give his old machine to me... > > Regards, kat > I think a lot depends on how it was saved as text. What graphics apps do you have available? Sometimes if you simply put the .pct extension on it, and instead of double clicking it, open it from a graphic app that reads PICT files, it will be fine. You might try it with GraphicConverter, but make a copy of the fie first. You might want to ask if it will work and how-to in the GraphicConverter yahoo group first: gcmac-subscribe@yahoogroups.com You can get GC here: http://lemkesoft.de inez ...

Printing text, image, text, image, ...
I have bunch of text files (about 20 lines) and bunch of image files. Is there a clean way to print text, then image, then text, then image, ...., as though there are in single continuous document? -- William Park <opengeometry@yahoo.ca>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/ ["Followup-To:" header set to comp.os.linux.misc.] On Fri, 23 Jun 2006 20:01:01 -0400, William Park staggered into the Black Sun and said: > I have [a] bu...

Converting creator and type codes in Visual Basic like in Visual C++
In Visual C++, creator and type codes are casted like UInt32 ulCreator = 'CE1F'; How can I do this in Visual Basic? Thanks Ralph Krausse www.consiliumsoft.com Use the START button? Then you need CSFastRunII... A new kind of application launcher integrated in the taskbar! ScreenShot - http://www.consiliumsoft.com/ScreenShot.jpg On Tue, 13 Apr 2004 08:08:39 -0700, Ralph Krausse wrote: > How can I do this in Visual Basic? Start by choosing the right newsgroup to ask your question in. On Tue, 13 Apr 2004 12:47:57 -0400, John Doe <a.nonymous@abuse.org> ...

Convert PDF to Image
Hi, I need to convert a page from a PDF to a JPG image. I have found a solution based on jdk 1.4 version, but i need to find an equivalent for the jdk 1.3. http://www.jpedal.org/image_tut1.php I have an other linker question : Do a JAR 'imageio' exists for java 1.3 with all classes javax.imageio.* of the jdk 1.4 ? Thanks. OTB wrote: .... > I need to convert a page from a PDF to a JPG image. Why JPEG? JPEG is not well suited to storing images that are mostly text or line drawings. PNG might be better suited (especially if you drop the image to 256 or less colors). > I hav...

convert image to text
I am interested in developing an online utility that will enable users to copy and past any image (or upload any image on the internet) to the online utility, which will then convert the image to text. The user may "convert image to text" and copy and paste the text displayed into any word document, or send the text displayed to the recipient's email address or can download the text file as a zipped text file that will be presented to the user as a small zipped text file icon. The user can also enter a passcode for the text file. This will scramble the text according to a calculation. The recipient of the text file will then need to go to the online utility, paste in the text, and press "convert to image" to see the original image that the sender had converted into text. If a password was set at the time of production then it will be required before the image can be restored from the text. The utility could be sold as a browser add-on or as a downloadable utility which will enable any image on the internet to be converted to text simply by right-clicking on it and selecting "convert to text" from the context menu. The image on our website is 500 x 375 pixels @ 145 kb. The same image converted to a string of characters (representing paletted colours between 0 -256) can be saved as a zipped text file of less than 2 kb with no loss in quality when the file is converted back to an image. Would anyone wish to promote this utility? ...

Converting PDF to text
What would people recommend for converting PDF to Text that: a) can be purchased on CDROM (no downloading) b) is compatible with MacOS X 10.5.1 c) can do this to a document that is 400 pages long I realize the result will be ugly compared to PDF. I am hoping for something less ugly than reading the PDF file with a text editor. I have looked at the Adobe.com website and I do not know enough about their various offerings that have Acrobat in the name. Besides, you folks might be less biased. In article <R8UDlPb3bL2d@eisner.encompasserve.org>, Kilgallen@SpamCop.net (Larry Kilgal...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

Is there any free pdf SDK or API I can use to convert picture or a html page to a pdf document programactically?
hi, guys Is there any free pdf SDK or API I can use to convert picture or a html page to a pdf document programactically? Basically, I want to create a pdf document based on a html page or a picture and do some merge job in code. But I could not find free api to do that. It will be great if anyone can give me some light. Thanks Nick wrote: > hi, guys > > Is there any free pdf SDK or API I can use to convert picture or a html > page to a pdf document programactically? Basically, I want to create a > pdf document based on a html page or a picture and do some merge job in > code. But I could not find free api to do that. It will be great if > anyone can give me some light. > > Thanks > Use html2ps (http://user.it.uu.se/~jan/html2ps.html) which can be converted to PDF using the standard tools (eg ps2pdf). ///Peter ...

pdf to plain-text converter
hi, Is there any free pdf to plain-text converter available. If so please point to me. thanks in advance Sinbad On Thu, 16 Apr 2009 02:11:25 -0700, sinbad ci disse: > Is there any free pdf to plain-text converter available [...] *pdftotext* included in *pdfutils* package (from xpdf site) - http://www.foolabs.com/xpdf/ pdftotext version 3.01 Copyright 1996-2005 Glyph & Cog, LLC Usage: pdftotext [options] <PDF-file> [<text-file>] -f <int> : first page to convert -l <int> : last page to convert -layout : maintain original phy...

Convert image to text #2
I am trying to figure out a way to convert an image into text. In other words, I want to take an image and have a text output of 0's for the white area and 1's for the black area. For instance, if I have a bitmap of an X, the text output would be: 1000001 0100010 0010100 0001000 0010100 0100010 1000001 Can anyone help me? Ed On 19 Jun, 14:46, Ed <exg...@gmail.com> wrote: > I am trying to figure out a way to convert an > image into text. =A0In other words, I want to take > an image and have a text output of 0's for the > white area and 1's for the black area. What you are looking for is OCR (optical character recognition). There are lots of applications out there than can perform the job. Some are free and some are very cheap and some are very expensive. Basically, you get what you pay for. I don't know how many are available for use with VB6, but I'm sure there are some. But if you are actually looking for a way of coding this yourself in straight VB code (or in any other code, for that matter) then you've got a *massive* job on your hands. This sort of thing is far from trivial, and is in fact extremely complex if you want worthwhile results. Mike Ed wrote: > I am trying to figure out a way to convert an image into text. In > other words, I want to take an image and have a text output of 0's for > the white area and 1's for the black area. > > For instance, if I have a bitmap of an X, the tex...

Convert text files to pdf/a
I need to convert many text files, a couple of hundred, each to one pdf/a file with the same name as the text file, for archiving. OS = Win2000. We have Acrobar Professional 7.0 . Is it the best tool for the task ? I am new to this. I was thinking about a command line script and a "for" command to go thru the text file list. How can I ensure that the result will indeed be in pdf/a ? I need this ASAP. Anybody ? Yours Zmurek jflato@o2.pl wrote: > I need to convert many text files, a couple of hundred, each to one > pdf/a file with the same name as the text file, for archivi...

Converting Text to and Image File
Does anyone know of any code or method that will let me convert text into a JPG image in VB6? Text to a bitmap format would do at a pinch, as I could probably get and automatically run a bitmap to JPG converter command line tool or something. But I would prefer direct text to JPG in VB6 code if possible. Thanks Dave. Simplest way... print the text to a picturebox, then save it... (air code) Picture1.AutoRedraw = True Picture1.BackColor = vbWhite Picture1.Print "Hello world" With Picture1 .FontName = "Comic Sans MS" .FontSize = 12 .FontBold = True .ForeColor = vbRed End With Picture1.Print "Red" SavePicture Picture1.Image, "Test.bmp" "David L. Jones" <altzone@gmail.com> wrote in message news:1182562374.882453.137960@o11g2000prd.googlegroups.com... > Does anyone know of any code or method that will let me convert text > into a JPG image in VB6? > > Text to a bitmap format would do at a pinch, as I could probably get > and automatically run a bitmap to JPG converter command line tool or > something. But I would prefer direct text to JPG in VB6 code if > possible. > > Thanks > Dave. > On 23 Jun, 02:32, "David L. Jones" <altz...@gmail.com> wrote: > Does anyone know of any code or method that will > let me convert text into a JPG image in VB6? Text > to a bitmap format would do at a pinch, as...

Converting PDF to image data
Hi. &nbsp; I've been trying to find a way to convert a PDF file to image data. Is it possible to do this, for example,&nbsp;via Active X? &nbsp; - Sami Hi Sami, what exactly do you mean? How do you define which part should become an image? One solution could be to open the pdf and make a screenshot of it. Mike Well, the idea&nbsp;is to open a PDF file that has only one page,&nbsp;save&nbsp;the page&nbsp;as&nbsp;a .bmp&nbsp;and&nbsp;convert&nbsp;the .bmp&nbsp;into a pixel map, and finally&nbsp;save&nbsp;the needed coordinates from the image data into an array. &nbsp; The use of screenshots is working well, but the final solution should do that automatically since there are probably thousands of PDF files to handle. &nbsp; - Sami Hi hotsam, you can make the screenshot automatically. See&nbsp;<a href="http://forums.ni.com/ni/board/message?board.id=170&amp;thread.id=252582&amp;view=by_date_ascending&amp;page=1" target="_blank">this thread</a>&nbsp;for more information about it. &nbsp; Hope it helps. Mike Ok. I'll check out that thread. Thank you very much. &nbsp; - Sami Google PDF2JPG or PDF2PNG. You'll find a number of converters that do this. I'm sure some will be usable through the command line. Some might be free... Regards, Wiebe. ...

convert a text report to an image
To protect from editing a text report, I want to make an watermarked image(any format and bit)&nbsp;from&nbsp;the text report which has already been made. How to do this work? Does Any funciton&nbsp;exist in LV8.2? hi there you could show the report on a VIs front panel in a string control and then use the "GetImage" method of the control. see attached example... &nbsp; Report2Image_8.2.vi: http://forums.ni.com/attachments/ni/170/277125/1/Report2Image_8.2.vi That's good idea. However, do you know how to&nbsp;insert a watermarked image (such as company logo) to the created image? hi there try this: 1. create an image of your watermark (use same bitdepth as in GetImage method and same size as your control). Use BLACK as background (0x000000)2. unflatten watermark with UnflattenPixmap.vi3. execute GetImage at the controls reference and unflatten with UnflattenPixmap.vi4.&nbsp;Sum the "Pixmap" arrays of both unflattenend pixmaps5. Flatten summarized pixmaps and save as picture &nbsp; Report2Image_II_8.2.vi: http://forums.ni.com/attachments/ni/170/277213/1/Report2Image_II_8.2.vi ...

OT: convert pdf to text
is there any good (and preferably free) tool for converting pdfs to plain text? i know its OT, but i dont know where else to turn... -- You're never too young to have a Vietnam flashback filox schrieb: > is there any good (and preferably free) tool for converting pdfs to plain > text? > i know its OT, but i dont know where else to turn... > Look at "pdftotext" ==> http://www.foolabs.com/xpdf/ ....Rolf filox wrote: > is there any good (and preferably free) tool for converting pdfs to plain > text? > i know its OT, but i dont know where else to turn... > > -- > You're never too young to have a Vietnam flashback Here's a freeware pdf reader that converts to text (among other things). http://www.pdf2exe.com/reader.html I think Foxit pdf reader might be able to do this also. -Dustin On 20 Jan 2007 06:50:12 -0800, Dustin wrote: > > Here's a freeware pdf reader that converts to text (among other > things). > > http://www.pdf2exe.com/reader.html Ugliest PDF rendering I've ever encountered (at least when run under wine on Linux). But I suppose that's irrelevant if one only wants conversion to text. Bob T. >>>>> "Rolf" == Rolf Niepraschk <Rolf.Niepraschk@gmx.de> writes: Rolf> filox schrieb: >> is there any good (and preferably free) tool for converting >> pdfs to plain text? Rolf> Look at "pdftotext"...

Converting a BMP image to PDF
Hi, I have written a jpeg to pdf conversion code in C. It is working fine. Now i am trying for bmp images. bmp images are stored in bottom up manner for which i wrote a function and inverted the image data. Then i changed the RGB to BGR. Now the problem is the image is displaying in PDF but it has tilted towards the right side.it is displaying as if it was skewed for 30-45 degrees angle. Please let me know what should i do to make the image display in a correct manner. Without applying my inverting function and RGB to BGR function also the image is displayed in inverted and red and bl...

Convert Access into Visual Basic
I have developed dictionary in Microsoft Access. I want to make this Dict in C++ or Visual Basic Does anybody have search engine source who can send to me. Thanks very much ...

Looking for PDF to image converter
Hallo, I am looking voor a java source/library for converting PDF to an image; The acrobat view bean has a memory leak; look at my post (marcbier) in: http://groups.yahoo.com/group/adobeacrobatviewer/messages/ Help appreciate Marcus Bierman wrote: > Hallo, > I am looking voor a java source/library for converting PDF to an image; > > The acrobat view bean has a memory leak; look at my post (marcbier) in: > http://groups.yahoo.com/group/adobeacrobatviewer/messages/ > > Help appreciate > Marcus > > This would be a starting point but not the answer: http:/...

Web resources about - Converting a PDF document image to text - comp.lang.basic.visual.misc

PastBook’s Filepicker.io Integration Eases Process Of Converting Facebook Content To Books
PastBook , one of several companies that allow Facebook users to publish their content on the social network in actual books , announced the ...

Facebook No Longer Converting Groups Into Pages
Back when Facebook first launched Facebook Pages, many businesses and brands who had built up substantial audiences in their Facebook Groups ...

Zwartz Laminating-Converting B.V. on the App Store on iTunes
Get Zwartz Laminating-Converting B.V. on the App Store. See screenshots and ratings, and read customer reviews.


"Occupier" Thanked Former Soviet Citizen for "Converting" Him to Capitalism, Pro-Israel, Pro-USA - YouTube ...
May Day Demonstration on Union Square in New York City Zionism & Birth of Modern Israel in 1948: Former Soviet Citizen Pays Tribute to Ben-Gurion ...

Converting dry air to water: solution to Broken Hill's water crisis gains support
A one-man crusade by a Broken Hill resident to solve the historic town's water crisis by introducing air to water converters is gaining support. ...

Click go fears of converting print files
Is there a way to convert a print queue item to a .RTF or .PDF file? I like to save or email them. - The Sydney Morning Herald

Sudanese woman ordered to hang under sharia law for converting to Christianity gives birth
Khartoum, Sudan: A Christian Sudanese woman sentenced to hang for apostasy has given birth in jail, a Western diplomat said on Tuesday.

Imams warn against radicalism to Aboriginal inmates converting to Islam
The prison system has enlisted the help of ASIO to crack down on radicalisation behind bars amid revelations that Aboriginals are converting ...

Converting the world's companies one by one - The Science Show - ABC Radio National (Australian Broadcasting ...
Image: Trucks carrying logs make their way up a road in Jambi, Indonesia. A vast area of the Sumatran forest, and orangutan habitat, is being ...

Resources last updated: 3/24/2016 8:27:11 PM