f



Copy text from PDF, no system font available

Hi,

I have a PDF doc which was initially created in QuarkXpress Passport 4.1 
(K): LaserWriter 8.8.7
and then was converted to PDF with Acrobat Distiller 5 for Macintosh.

I don't have any contact deltails of the original author or the Quark file, 
just the PDF.

I just want to copy certain text from the PDF to Word
so that I can add it to a document that I am writing.

In Document Properties/Fonts it shows the fonts used in the document and 
they are all Emdebbed Subset, Type: Type 1 and Encoding Custom.

The text I want to copy is written with font MgHelveticaLight-Normal.

When I use the touch up text tool to edit the PDF using Acrobat Pro 6, in 
the Permissions it says that No System font available and the Embed and 
Subset options are both grayed out.

I have installed similar fonts to the one mentioned above in my system like:

MgHelvetica,Bold
MgHelveticaExta
Helvetica-Light

But whenever I try to change the font from the embedded subset to any of the 
ones in the list above which are now installed in my system, I get the 
following error message.

"The change to a different font was not done because the chosen font and the 
font encodings in the document differ and could not be resolved."

So besides the fact that I can't edit the PDF, even when I try to copy text 
to Word the text pasted looks nothing like the original. It look someting 
like:


A?OAAEO?O>EUEUO? ?UO�UUE�IO? UE?EUO-?U?IO?, UUO UEIA>O UUO O?O>O ?E$U?AU�E 
�??UO I$OUUO UE~ ?i<O�~. ??U? i� A>O�E U�?U??UOO� UO UEI�OUEI?UAUO $UAO 
?U�U>OO?

which is not readable, unlike the original copied PDF text which is fine.

Any advice would be more than welcome since I urgently need to copy that 
text to edit it.

Regards,
Thanos. 


0
5/11/2006 4:43:42 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

2 Replies
730 Views

Similar Articles

[PageSpeed] 0

"Thanasis" <mustang@mailbox.gr> wrote in message
news:dc4c3$446369c0$52ad7057$30920@news.versatel.net...
....
> In Document Properties/Fonts it shows the fonts used in the document and
> they are all Emdebbed Subset, Type: Type 1 and Encoding Custom.
>
> The text I want to copy is written with font MgHelveticaLight-Normal.
>
> When I use the touch up text tool to edit the PDF using Acrobat Pro 6, in
> the Permissions it says that No System font available and the Embed and
> Subset options are both grayed out.
>
> I have installed similar fonts to the one mentioned above in my system
like:
>
> MgHelvetica,Bold
> MgHelveticaExta
> Helvetica-Light
>
> But whenever I try to change the font from the embedded subset to any of
the
> ones in the list above which are now installed in my system, I get the
> following error message.
>
> "The change to a different font was not done because the chosen font and
the
> font encodings in the document differ and could not be resolved."
>
> So besides the fact that I can't edit the PDF, even when I try to copy
text
> to Word the text pasted looks nothing like the original. It look someting
> like:
>
>
> A?OAAEO?O>EUEUO? ?UO�UUE�IO? UE?EUO-?U?IO?, UUO UEIA>O UUO O?O>O ?E$U?AU�E
> �??UO I$OUUO UE~ ?i<O�~. ??U? i� A>O�E U�?U??UOO� UO UEI�OUEI?UAUO $UAO
> ?U�U>OO?
>
> which is not readable, unlike the original copied PDF text which is fine.

The 'encoding' mentioned is the translation of binary values to an actual
character in the font to be drawn. More precisely, it is an index into a
large array (the font) of special Postscript commands which draw
*something*. The convention known as ASCII is just that, a convention, and
the Distiller (or any other program producing a PDF) may choose to disregard
the ASCII encoding and start all over at 'first character used = 1' and
onwards from that.
The text in the PDF still looks fine because encoded character #1 points to
a valid character (say, the image for an 'F'), number #2 points to a
character 'o' and so on.
Even if you extract the precise text string (or rather, a list of binary
values), you would have to compare each possible character command string of
the original font to the ones used in the PDF to re-match them with ASCII
characters.
IOW, well, maybe not 'impossible', but you might think so as well.

[Jongware]


0
jongware
5/11/2006 7:39:12 PM
"[jongware]" <jongware@post.in.group.plz> wrote in message 
news:e6d8b$446392dd$3ec3d678$24021@news.chello.nl...
>
> The 'encoding' mentioned is the translation of binary values to an actual
> character in the font to be drawn. More precisely, it is an index into a
> large array (the font) of special Postscript commands which draw
> *something*. The convention known as ASCII is just that, a convention, and
> the Distiller (or any other program producing a PDF) may choose to 
> disregard
> the ASCII encoding and start all over at 'first character used = 1' and
> onwards from that.
> The text in the PDF still looks fine because encoded character #1 points 
> to
> a valid character (say, the image for an 'F'), number #2 points to a
> character 'o' and so on.
> Even if you extract the precise text string (or rather, a list of binary
> values), you would have to compare each possible character command string 
> of
> the original font to the ones used in the PDF to re-match them with ASCII
> characters.
> IOW, well, maybe not 'impossible', but you might think so as well.
>
> [Jongware]
>
>

Hi,

Thanks for the reply first of all.

From what you explained, I guess
it is not possible to buy this font from any web-site on-line.

MgHelveticaLight-Normal
Type: Type 1
Encoding: Custom

Ideally I would buy and install this font to be able to copy and paste the 
same text into Word.

Regarding the custom encoding applied by the distiller which ignores the 
ASCII convention,
how it is possible to reproduce the same encoding to a font that I can then 
install to my system.

Does it mean that what the distiller has done with custom encoding it not 
possible to reproduce anymore in a reverse
engineering sense?

Or, is it possible somehow to extract the font which is embedded in the PDF 
so that I install it as a system
font too?

Regards,
Thanos. 


0
Thanasis
5/11/2006 10:12:42 PM
Reply: