f



Colored Text extraction from PDF

Hi All
is it possible to extract the colored text from pdf.

for example:
There are 3 color texts in a pdf -- RED, GREEN and BLACK.
is it possible to extract text which are red and green in color?

- Regards
Azodious
0
6/3/2009 2:49:18 PM
comp.lang.java.programmer 52714 articles. 1 followers. Post Follow

2 Replies
433 Views

Similar Articles

[PageSpeed] 34

Azodious wrote:
> Hi All
> is it possible to extract the colored text from pdf.
> 
> for example:
> There are 3 color texts in a pdf -- RED, GREEN and BLACK.
> is it possible to extract text which are red and green in color?
> 

Yes.

It is possible, but I know of no method I'd actually want to use.

Just my �0.02 worth.

-- 
RGB
0
6/3/2009 4:10:00 PM
On Wed, 3 Jun 2009 07:49:18 -0700 (PDT), Azodious
<nehilparashar@gmail.com> wrote, quoted or indirectly quoted someone
who said :

>Hi All
>is it possible to extract the colored text from pdf.
>
>for example:
>There are 3 color texts in a pdf -- RED, GREEN and BLACK.
>is it possible to extract text which are red and green in color?

There are all kinds of tools for manipulating PDF files.
Unfortunately, I don't have personal experience with them.
see http://mindprod.com/jgloss/pdf.html

PostScript is similar to using Java's drawString and brothers in
PaintComponent.  In PostScript, you use setrgbcolor or set hsbcolor to
load up your paintbrush with a colour. The problem is similar to
trying to extract text painted in a particular colour from Java source
using drawString.  You would mostly likely do it by substituting the
paint methods and capturing parameters when you run the code. Trying
to do it statically would be extremely difficult.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80
0
see_website (5876)
6/3/2009 6:05:41 PM
Reply: