f



How to get text from PDF?

Hi all,

I have my web server bases on linux. I am working on a project for
which I need to get text out of PDF file. I need to know which text
belongs to which PDF page number?

Is there any utility/tool that should be installed on linux and I can
use it from command line in PHP through exec() or system() etc for
this purpose?

Please reply me urgently.

Thanks in advance.
0
Shahid
12/22/2008 3:06:58 PM
comp.lang.perl.misc 33233 articles. 2 followers. brian (1246) is leader. Post Follow

2 Replies
545 Views

Similar Articles

[PageSpeed] 24

Shahid wrote:
> Hi all,
> 
> I have my web server bases on linux. I am working on a project for 
> which I need to get text out of PDF file. I need to know which text 
> belongs to which PDF page number?
> 
> Is there any utility/tool that should be installed on linux and I can
> use it from command line in PHP through exec() or system() etc for 
> this purpose?

What you want to do may not be possible. comp.text.pdf would be a better
place to ask.
0
Scott
12/22/2008 3:31:26 PM
On Dec 22, 10:06 am, Shahid <mirzashahidmahm...@gmail.com> wrote:
> Hi all,
>
> I have my web server bases on linux. I am working on a project for
> which I need to get text out of PDF file. I need to know which text
> belongs to which PDF page number?
>
> Is there any utility/tool that should be installed on linux and I can
> use it from command line in PHP through exec() or system() etc for
> this purpose?
>
> Please reply me urgently.
>
> Thanks in advance.


There is a module on CPAN called PDF::OCR::Thorough which attempts
to extract text from pdf docs.  I've never used it and it looks like
a fair amount of work to set up.  If the pdf file has a known simple
structure, there may be easier ways.
0
smallpond
12/22/2008 4:01:27 PM
Reply: