f



copying text from pdf

Morning!

why do texts from different .pdf files get copied differently (if selected with 'Touch up text' and copied to a text editor)? I have
observed the following behaviors:
- single characters are marked in the original; copied, there are no spaces inbetween;
- a text gets underlined in the original; copied, there are no spaces;
- whole text gets marked; copied, it's usually fine;
- only a single line can be copied at a time; usually without extra problems...

What does this depend on? When I convert a Word document to .pdf, can I choose which of the above behaviors I will prefer for the
resulting file?

Many thanks in advance!

bj


0
bj
11/28/2005 5:10:58 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

1 Replies
577 Views

Similar Articles

[PageSpeed] 33

In article <dmfd52$3f6$1@achot.icm.edu.pl>, SpillOut99@yahoo.com says...
> Morning!
> 
> why do texts from different .pdf files get copied differently (if selected with 'Touch up text' and copied to a text editor)? I have
> observed the following behaviors:

Because the 'text' is laid out differently in each case. When you 
consider the plethora of ways of drawing text, this is not surprising.

> - single characters are marked in the original; copied, there are no spaces inbetween;

If you copy a single character, why would you expect spaces ? In between 
what are you expecting to see spaces ?

> - a text gets underlined in the original; copied, there are no spaces;

What is 'a text' ? A glyph, a word, more ? Why do you expect to see 
spaces ?

> - whole text gets marked; copied, it's usually fine;
> - only a single line can be copied at a time; usually without extra problems...

These two statements seem contradictory, if you can copy the whoe of the 
tex and its fine, why not do so ?

 
> What does this depend on? When I convert a Word document to .pdf, can I choose which of the above behaviors I will prefer for the
> resulting file?

Not really. The Word file is 'printed' to PDF;

Word decides how to describe the text to the GDI (Graphics Device 
Interface) which will depend on how the text weas created in Word, this 
you do have control over, but figuring out how Word does this conversion 
is not trivial. Also some operations need to be done in specific ways, 
which you can't change (text boxes, Word Art, headers, footers etc)

The GDI commands are fed to the PostScript driver The PostScript driver 
will then decide how to represent that in the PostScript language. You 
have little or no control here.

Finally, the PostScript to PDF conversion utility will decide how best 
to represent the PostScript program in PDF format. I suspect that you 
have no control over this.

Some PDF creators will not convert to PostScript, but will convert the 
GDI commands directly to PDF, which eliminates some steps (at the cost 
of quality and flexibility, usually). However, you still have two 
applications making decisions so much the same arguments apply.


			Ken
0
Ken
11/28/2005 5:12:59 PM
Reply: