f



Q.: Only one font in use, but 100 fonts embedded ... ?

The Subject line gives it in a nutshell -- my question(s) is (are) why? (and how to fix?)

We've got a PDF containing the fully laid out, formatted pages of a book that uses one font (T.N.R.) in normal, bold, and italic, in several point-sizes. This PDF's Properties sheet shows about a hundred randomly-alphanumerically-named Type 1 embedded fonts. Why should that be? And is there anything we can do to reduce that number?

The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file itself obtained by printing a MS Word .DOC file to a PostScript printer "on FILE", with all three -- the .DOC file, the .PS file, and the Distiller run -- having been requested to Embed any font used.

All insights gratefully accepted. Cheers, --- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/12/2011 7:11:57 PM
comp.text.pdf 5600 articles. 0 followers. ramon (1518) is leader. Post Follow

28 Replies
1586 Views

Similar Articles

[PageSpeed] 57

In article <op.vqs0x7bsitl47o@acer250.gateway.2wire.net>, 
tPlOvUpBErLeLsEs@hotmail.com says...

> We've got a PDF containing the fully laid out, formatted pages of a 
book that uses one font (T.N.R.) in normal, bold, and italic, 

That's three fonts then, one for each style in the family.


> in several point-sizes. This PDF's Properties sheet shows about a 
hundred randomly-alphanumerically-named Type 1 embedded fonts. Why 
should that be? 

Its really hard to say anything much without seeing the PostScript file 
at the very least, but the probability is that its the way the 
PostScript program has been generated. 

It may be that each time the font is used the program assigns it a new 
name. Possibly it loads the fonts in local VM and save/restores around 
each page, resulting in the font on each page being re-emitted because 
Distiller thinks its a new font.


> And is there anything we can do to reduce that number?
> 
> The PDF is produced by Acrobat Distiller (v. 4.05)

That's very old, the current version of Acrobat is 10. I think that is 
probably around 8 or 9 years old.


> from a PS file itself obtained by printing a MS Word .DOC file to a 
PostScript printer "on FILE", with all three

On which OS, using which office software, and which PostScript driver ?


			Ken
0
ken
2/12/2011 9:17:23 PM
At Sat, 12 Feb 2011 14:11:57 -0500 tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:

> 
> The Subject line gives it in a nutshell -- my question(s) is (are)
> why? (and how to fix?)
> 
> We've got a PDF containing the fully laid out, formatted pages of a
> book that uses one font (T.N.R.) in normal, bold, and italic, in
> several point-sizes. This PDF's Properties sheet shows about a hundred
> randomly-alphanumerically-named Type 1 embedded fonts. Why should that
> be? And is there anything we can do to reduce that number?

First of all, 'one font (T.N.R.) in normal, bold, and italic, in
several point-sizes' is not 'one font'.  Each face type (roman
[normal], bold, and italic), is a *separate* font.  Each *point-size*
is a separate font (yes one can use scalable fonts, but things look far
better to use fonts on the right size -- a 6pt font is non-trivially
*different* from a 12pt font of the same face).  The book probably also
has one (or more) 'symbol' fonts (for all of those pesky symbols, like
copyright, trademark, list bullets, etc.).  And if there is math, there
will be font(s) for the math symbols. One can *very quickly* wind up
with 'a hundred' (or more!) fonts.


> 
> The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file
> itself obtained by printing a MS Word .DOC file to a PostScript printer
> "on FILE", with all three -- the .DOC file, the .PS file, and the
> Distiller run -- having been requested to Embed any font used.

Some questions / thoughts:

I understand that MS-Word will maintain a 'revision history'.  Did the
author do the 'magic' to flush this history before 'printing' the
document? It is possible that there is deleted text using additional
fonts and MS-Word is 'embedding' these otherwise unused fonts.

Why did you go though the two-step process (MS-Word => PostScript, then
PostScript => PDF).  I understand that Acrobat Distiller has the option
of 'Print to PDF'.

Additional thought:

OpenOffice has the ability to 'Export to PDF' and can open MS-Word
documents.  It should be possible to use OO to do a direct conversion
from MS-Word to PDF that way.  You might get better results that way. 
OpenOffice is *free* download and runs on all operating systems.

> 
> All insights gratefully accepted. Cheers, --- tlvp

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                                                                         
0
Robert
2/12/2011 9:25:50 PM
tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote in
news:op.vqs0x7bsitl47o@acer250.gateway.2wire.net: 
> We've got a PDF containing the fully laid out, formatted pages of a
> book that uses one font (T.N.R.) in normal, bold, and italic, in
> several point-sizes. This PDF's Properties sheet shows about a hundred
> randomly-alphanumerically-named Type 1 embedded fonts. Why should that
> be? 

One reason for that could be that the printer driver has been instructed to 
download the fonts to the "printer" (which is the Distiller in this case) 
as a raster font. The driver will create a font for each size and other 
variations. Back in the days of Distiller 4 there may have been reasons for 
using that setting for small text, but in your case the adjustable treshold 
for raster font generation may be set improperly.

> And is there anything we can do to reduce that number? 

....Recreate the file using correct driver settings...

(Possibly some PDF editor could fix it. OpenOffice Draw can import PDF 
files and could be used to recreate the PDF.)
0
Matti
2/12/2011 10:12:35 PM
On 12/02/11 19:11, tlvp wrote:
> The Subject line gives it in a nutshell -- my question(s) is (are) why?
> (and how to fix?)
>
> We've got a PDF containing the fully laid out, formatted pages of a book
> that uses one font (T.N.R.) in normal, bold, and italic, in several
> point-sizes. This PDF's Properties sheet shows about a hundred
> randomly-alphanumerically-named Type 1 embedded fonts. Why should that
> be? And is there anything we can do to reduce that number?
>
> The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file itself
> obtained by printing a MS Word .DOC file to a PostScript printer "on
> FILE", with all three -- the .DOC file, the .PS file, and the Distiller
> run -- having been requested to Embed any font used.

Hardly surprising with a workflow like that. You don't give details of 
whose TNR fonts you used but at a wild random guess, they weren't PS 
Type 1 fonts, so the PS driver failed to recognise them as coming from 
the same source when they were embedded, resulting in each combination 
of size, weight, width, and style becoming a separate font.

I'm afraid have no idea how to fix this, except to reiterate that 
Microsoft Word is not a typesetting system, and should not be used for 
final-form book production; and Microsoft's default PS print-to-file 
driver has been broken for years if not decades, and should be replaced 
by the Adobe one if you really need to generate PS. It would be far 
simpler, I suspect, just to use a PDF driver, such as the ones that come 
built into other wordprocessors.

///Peter
0
Peter
2/13/2011 12:47:25 AM
On Sat, 12 Feb 2011 14:11:57 -0500, tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:

> The Subject line gives it in a nutshell -- my question(s) is (are) why? (and how to fix?)
>
> We've got a PDF containing the fully laid out, formatted pages of a book that uses one font (T.N.R.) in normal, bold, and italic, in several point-sizes. This PDF's Properties sheet shows about a hundred randomly-alphanumerically-named Type 1 embedded fonts. Why should that be? And is there anything we can do to reduce that number?
>
> The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file itself obtained by printing a MS Word .DOC file to a PostScript printer "on FILE", with all three -- the .DOC file, the .PS file, and the Distiller run -- having been requested to Embed any font used.
>
> All insights gratefully accepted. Cheers, --- tlvp

Ken, Robert, Matti, Peter: many thanks for your comments, insights, suggestions, and requests for omitted data. FWIW:

the Word we're using is that from MS Office 2000, with option to Embed all Fonts;

our Times New Roman is whatever TTF that instance of MS Word is finding on the Windows XP system it's installed on;

the PS printer whose driver we're using to capture to FILE the PS print stream is the Windows XP HP Laser Jet IIIsi PostScript v52.3 driver, set to Download TrueType as SoftFont, Outline, and to Optimize PS Output for Portability; and

Acrobat Distiller is also set to embed all fonts used.

It may well be, as ken surmises, that

| ... each time the font is used the program assigns it a new
| name. Possibly it loads the fonts in local VM and save/restores around
| each page, resulting in the font on each page being re-emitted because
| Distiller thinks its a new font.

In that case it's probably pointless to hope for any cure.

Robert points out, as well, that

| ... Each face type (roman
| [normal], bold, and italic), is a *separate* font.  Each *point-size*
| is a separate font (yes one can use scalable fonts, but things look far
| better to use fonts on the right size -- a 6pt font is non-trivially
| *different* from a 12pt font of the same face).  The book probably also
| has one (or more) 'symbol' fonts (for all of those pesky symbols, like
| copyright, trademark, list bullets, etc.).

Certainly if each transition from normal to bold or to italic, and each
transition back, results in a fresh download of the relevant font, that'll
quickly result in the hundred or so font embeddings we see. 'Symbols' are
all chosen from within T.N.R. itself (bullets, copyright, uniquely Polish
diacritically marked characters), but ... .

As to Word's 'revision history', we're of the impression that this was
turned OFF, whence nothing to 'flush'. As to whether "it is possible that
there is deleted text using additional fonts and MS-Word is 'embedding'
these otherwise unused fonts," I suppose it is possible -- certainly the
same DOC file when "saved as HTML" shows HTML code calling for MS Mincho,
Arial, Courier, and yet other fonts, for use with numerous void (0-length)
text strings whose etiology is a mystery to me.

In the HTML it's easy to prune that garbage away. Harder in the DOC file itself.

As for why we didn't use Acrobat Distiller's "option of 'Print to PDF'", we
found we could not reliably get fonts or graphics to embed going that route.

The OpenOffice approach may work, thanks for the idea, *provided* it does
NOT completely reflow our document (reflowing, repaginating, etc., could
be an utter disaster -- MS Word itself is ba enough  on that score :-) ).
(For this idea, voiced not only by Robert but seconded by Matti & Peter,
additional thanks.)

For the current project, I guess we'll just carry on with stiff upper lip. For
future projects, perhaps, the OpenOffice approach may be a better way to go.

Thanks again to all! And cheers, -- tlvp




-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/13/2011 3:11:29 AM
On 2011-02-13 04:11, "tlvp" wrote:

> [...]
> 
> the PS printer whose driver we're using to capture to FILE the PS print
> stream is the Windows XP HP Laser Jet IIIsi PostScript v52.3 driver, set
> to Download TrueType as SoftFont, Outline, and to Optimize PS Output for
> Portability; and

AFAIR HP did _never_ use a certified PostScript driver but a home-grown
"PostScript Emulation". You might try another printer driver, e.g. Apple
or Tektronix. And perhaps a printer is considered a "page oriented
device" while font embedding into PDF should be done "document oriented".

> Acrobat Distiller is also set to embed all fonts used.
                                   ^^^^^^^^^^^^^^^^^^^^

Sure? See below ...

> [...]
> 
> As for why we didn't use Acrobat Distiller's "option of 'Print to PDF'", we
> found we could not reliably get fonts or graphics to embed going that route.

I can't comment on your rather old version of Acrobat -- but with my
"slightly newer" version (Version 6 Standard Edition) there are
Distiller settings controlling font embedding; there is definitely an
option "do embed all fonts _but_ the standard fonts" (or similar
phrasing, i.e., do _not_ embed Helvetica, Times and Courier).

I don't see your font problems with a similar "vintage environment":
Windows 2000 Professional, Word 2002 (from Office XP) and Acrobat 6. I'm
directly creating PDFs from Word using the "PDFMaker" option which
itself calls Distiller.

> [...]

Michael

-- 
Real names enhance the probability of getting real answers.
My e-mail account at DECUS Munich is no longer valid.
0
2/13/2011 9:20:28 AM
In article <op.vqtm5fc8itl47o@acer250.gateway.2wire.net>, 
tPlOvUpBErLeLsEs@hotmail.com says...

> the Word we're using is that from MS Office 2000, with option to Embed 
all Fonts;
> 
> our Times New Roman is whatever TTF that instance of MS Word is finding on the Windows XP system it's installed on;

That's probably where your problem starts. You can't use TrueType fonts 
directly in PostScript, there's no support for it. So the printer driver 
will have to convert the TrueType fonts into Type 42 fonts.
 
> the PS printer whose driver we're using to capture to FILE the PS print stream is the Windows XP HP Laser Jet IIIsi PostScript v52.3 driver,

That's actually the pritner, not the driver. Its a difficult distinction 
to make, PostScript is Mostly device-independent, so the output only 
needs a few tweaks to be compatible with a different pritner. This is 
done with a PPD (PostScript Printer Description) file, which on WIndows 
are actuallty stored as .wpd files.

The driver generates the main body of the PostScript and there are 
several differnt Microsoft and Adobe ones, which work differently.


> set to Download TrueType as SoftFont, Outline, and to Optimize PS 
Output for Portability; and

The portability option probably means that unique fonts are downloaded 
on every page, in order to prevent memory exhaustion on dektop printers, 
and flushed at the end of every page. This will quickly create a large 
nunber of fonts.

Newer versions of Acrobat Distiller using the Adobe PostScript driver 
(shipped in newer versions of Windows) are able to recognise these 
fragments and recombine them. This only works (as far as I can tell) if 
you are using teh Adobe PostScript driver, and a relatively recent 
version of Acrobat Distiller).
 

> It may well be, as ken surmises, that
> 
> | ... each time the font is used the program assigns it a new
> | name. Possibly it loads the fonts in local VM and save/restores around
> | each page, resulting in the font on each page being re-emitted because
> | Distiller thinks its a new font.
> 
> In that case it's probably pointless to hope for any cure.

Its probably impossible to 'fix' the document you already ahve, but you 
might be able to remake the PDF file from the original Word document. 
You could try altering the settings of font embedding in the driver for 
example.

Better yet (as suggested by others) would be to use a recent version of 
Open Office or even a recent version of Microsoft Office, both of these 
are able to produce PDF files directly from teh source document.


> For the current project, I guess we'll just carry on with stiff upper 
lip. For
> future projects, perhaps, the OpenOffice approach may be a better way to go.

If you are going to do more books, I would suggest you invest in 
something which is better suited to the task. Office is good for 
letters, not so good for books. InDesign or even Adobe FrameMaker (do 
they still sell that ?) would do a better job.


		Ken
0
ken161 (741)
2/13/2011 9:36:37 AM
At Sat, 12 Feb 2011 22:11:29 -0500 tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:

> 
> On Sat, 12 Feb 2011 14:11:57 -0500, tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:
> 
> > The Subject line gives it in a nutshell -- my question(s) is (are) why? (and how to fix?)
> >
> > We've got a PDF containing the fully laid out, formatted pages of a book that uses one font (T.N.R.) in normal, bold, and italic, in several point-sizes. This PDF's Properties sheet shows about a hundred randomly-alphanumerically-named Type 1 embedded fonts. Why should that be? And is there anything we can do to reduce that number?
> >
> > The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file itself obtained by printing a MS Word .DOC file to a PostScript printer "on FILE", with all three -- the .DOC file, the .PS file, and the Distiller run -- having been requested to Embed any font used.
> >
> > All insights gratefully accepted. Cheers, --- tlvp
> 
> Ken, Robert, Matti, Peter: many thanks for your comments, insights, suggestions, and requests for omitted data. FWIW:
> 
> the Word we're using is that from MS Office 2000, with option to Embed all Fonts;
> 
> our Times New Roman is whatever TTF that instance of MS Word is finding on the Windows XP system it's installed on;
> 
> the PS printer whose driver we're using to capture to FILE the PS print stream is the Windows XP HP Laser Jet IIIsi PostScript v52.3 driver, set to Download TrueType as SoftFont, Outline, and to Optimize PS Output for Portability; and
> 
> Acrobat Distiller is also set to embed all fonts used.
> 
> It may well be, as ken surmises, that
> 
> | ... each time the font is used the program assigns it a new
> | name. Possibly it loads the fonts in local VM and save/restores around
> | each page, resulting in the font on each page being re-emitted because
> | Distiller thinks its a new font.
> 
> In that case it's probably pointless to hope for any cure.
> 
> Robert points out, as well, that
> 
> | ... Each face type (roman
> | [normal], bold, and italic), is a *separate* font.  Each *point-size*
> | is a separate font (yes one can use scalable fonts, but things look far
> | better to use fonts on the right size -- a 6pt font is non-trivially
> | *different* from a 12pt font of the same face).  The book probably also
> | has one (or more) 'symbol' fonts (for all of those pesky symbols, like
> | copyright, trademark, list bullets, etc.).
> 
> Certainly if each transition from normal to bold or to italic, and each
> transition back, results in a fresh download of the relevant font, that'll
> quickly result in the hundred or so font embeddings we see. 'Symbols' are
> all chosen from within T.N.R. itself (bullets, copyright, uniquely Polish
> diacritically marked characters), but ... .
> 
> As to Word's 'revision history', we're of the impression that this was
> turned OFF, whence nothing to 'flush'. As to whether "it is possible that
> there is deleted text using additional fonts and MS-Word is 'embedding'
> these otherwise unused fonts," I suppose it is possible -- certainly the
> same DOC file when "saved as HTML" shows HTML code calling for MS Mincho,
> Arial, Courier, and yet other fonts, for use with numerous void (0-length)
> text strings whose etiology is a mystery to me.

No real mystery here:  These 0-length text strings are an *artifact* of
how most/all *word processors* operate. What happens is you type in some
text, and then erase it. Since the erasing does not always *exactly*
match the original typing in (and in fact there might be various bits of
editing at different times).  This often as not results in 'orpaned'
0-length text snippits, sometimes in other fonts.  

> 
> In the HTML it's easy to prune that garbage away. Harder in the DOC file itself.
> 
> As for why we didn't use Acrobat Distiller's "option of 'Print to PDF'", we
> found we could not reliably get fonts or graphics to embed going that route.
> 
> The OpenOffice approach may work, thanks for the idea, *provided* it does
> NOT completely reflow our document (reflowing, repaginating, etc., could
> be an utter disaster -- MS Word itself is ba enough  on that score :-) ).
> (For this idea, voiced not only by Robert but seconded by Matti & Peter,
> additional thanks.)

The *real* solution to this is to not use a word processor at all for
the final document.  As was pointed out in another message: MS-Word is
really, really, *bad* for this sort of task.  *I* would suggest using
LaTeX, which using pdflatex can go from the original text to pdf in one
step and which will embed just the right fonts and does a *wonderfull*
job of typesetting.  Everytime.  And is *free*, *cross platform*, and
*future proof*.

> 
> For the current project, I guess we'll just carry on with stiff upper lip. For
> future projects, perhaps, the OpenOffice approach may be a better way to go.
> 
> Thanks again to all! And cheers, -- tlvp
> 
> 
> 
> 

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                                                                
0
heller (3031)
2/13/2011 1:35:06 PM
On 13/02/11 13:35, Robert Heller wrote:
> At Sat, 12 Feb 2011 22:11:29 -0500 tlvp<tPlOvUpBErLeLsEs@hotmail.com>  wrote:
>[...]
> The *real* solution to this is to not use a word processor at all for
> the final document.  As was pointed out in another message: MS-Word is
> really, really, *bad* for this sort of task.  *I* would suggest using
> LaTeX, which using pdflatex can go from the original text to pdf in one
> step and which will embed just the right fonts and does a *wonderfull*
> job of typesetting.  Everytime.  And is *free*, *cross platform*, and
> *future proof*.
>>
>> For the current project, I guess we'll just carry on with stiff upper lip. For
>> future projects, perhaps, the OpenOffice approach may be a better way to go.

I would second Robert: LaTeX is ideal for this kind of application.

///Peter


0
peter2615 (652)
2/13/2011 5:14:07 PM
On Sat, 12 Feb 2011 22:11:29 -0500, tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:

> On Sat, 12 Feb 2011 14:11:57 -0500, tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:
>
>> The Subject line gives it in a nutshell -- my question(s) is (are) why? (and how to fix?)
>>
>> We've got a PDF containing the fully laid out, formatted pages of a book that uses one font (T.N.R.) in normal, bold, and italic, in several point-sizes. This PDF's Properties sheet shows about a hundred randomly-alphanumerically-named Type 1 embedded fonts. Why should that be? And is there anything we can do to reduce that number?
>>
>> The PDF is produced by Acrobat Distiller (v. 4.05) from a PS file itself obtained by printing a MS Word .DOC file to a PostScript printer "on FILE", with all three -- the .DOC file, the .PS file, and the Distiller run -- having been requested to Embed any font used.
>>
>> All insights gratefully accepted. Cheers, --- tlvp
>
> Ken, Robert, Matti, Peter: many thanks for your comments, insights, suggestions, and requests for omitted data. FWIW:
>
> the Word we're using is that from MS Office 2000, with option to Embed all Fonts;
>
> our Times New Roman is whatever TTF that instance of MS Word is finding on the Windows XP system it's installed on;
>
> the PS printer whose driver we're using to capture to FILE the PS print stream is the Windows XP HP Laser Jet IIIsi PostScript v52.3 driver, set to Download TrueType as SoftFont, Outline, and to Optimize PS Output for Portability; and
>
> Acrobat Distiller is also set to embed all fonts used.
>
> It may well be, as ken surmises, that
>
> | ... each time the font is used the program assigns it a new
> | name. Possibly it loads the fonts in local VM and save/restores around
> | each page, resulting in the font on each page being re-emitted because
> | Distiller thinks its a new font.
>
> In that case it's probably pointless to hope for any cure.
>
> Robert points out, as well, that
>
> | ... Each face type (roman
> | [normal], bold, and italic), is a *separate* font.  Each *point-size*
> | is a separate font (yes one can use scalable fonts, but things look far
> | better to use fonts on the right size -- a 6pt font is non-trivially
> | *different* from a 12pt font of the same face).  The book probably also
> | has one (or more) 'symbol' fonts (for all of those pesky symbols, like
> | copyright, trademark, list bullets, etc.).
>
> Certainly if each transition from normal to bold or to italic, and each
> transition back, results in a fresh download of the relevant font, that'll
> quickly result in the hundred or so font embeddings we see. 'Symbols' are
> all chosen from within T.N.R. itself (bullets, copyright, uniquely Polish
> diacritically marked characters), but ... .
>
> As to Word's 'revision history', we're of the impression that this was
> turned OFF, whence nothing to 'flush'. As to whether "it is possible that
> there is deleted text using additional fonts and MS-Word is 'embedding'
> these otherwise unused fonts," I suppose it is possible -- certainly the
> same DOC file when "saved as HTML" shows HTML code calling for MS Mincho,
> Arial, Courier, and yet other fonts, for use with numerous void (0-length)
> text strings whose etiology is a mystery to me.
>
> In the HTML it's easy to prune that garbage away. Harder in the DOC file itself.
>
> As for why we didn't use Acrobat Distiller's "option of 'Print to PDF'", we
> found we could not reliably get fonts or graphics to embed going that route.
>
> The OpenOffice approach may work, thanks for the idea, *provided* it does
> NOT completely reflow our document (reflowing, repaginating, etc., could
> be an utter disaster -- MS Word itself is ba enough  on that score :-) ).
> (For this idea, voiced not only by Robert but seconded by Matti & Peter,
> additional thanks.)
>
> For the current project, I guess we'll just carry on with stiff upper lip. For
> future projects, perhaps, the OpenOffice approach may be a better way to go.
>
> Thanks again to all! And cheers, -- tlvp

Thank you, Michael, ken, Robert, Peter; some helpful ideas. Let me respond
to them in turn.

Michael's suggestion to "try another printer driver, e.g. Apple or Tektronix"
may well be one we should take up ... though, if his impression that

> perhaps a printer is considered a "page oriented device" whilefont embedding into PDF should be done "document oriented"

is correct, we may never eliminate the phenomenon of a fresh font download
for every fresh page.

On another matter, even Acrobat 4.05's Distiller has user-adjustable "settings
controlling font embedding; there is definitely an option "do embed all fonts
_but_ the standard fonts" (or similar phrasing, i.e., do _not_ embed Helvetica,
Times and Courier)." But it's not clear which Distiller Job Options settings
get called when using the "PDFMaker" printer from within a Word Document,
whence I'm not at all sure whether we too w"on't see your font problems with
a similar "vintage environment": Windows 2000 Professional, Word 2002
(from Office XP) and Acrobat 6. I'm directly creating PDFs from Word using
the "PDFMaker" option which itself calls Distiller."

Worth trying, though, the next time around.

Michael's "page oriented" versus "document oriented" distinction is given
more body in ken's remarks concerning our choice of PS Output type -- "for
Portability" -- with his indication that

> The portability option probably means that unique fonts are downloadedon every page, in order to prevent memory exhaustion on dektop printers,and flushed at the end of every page. This will quickly create a largenumber of fonts.

We should, then learn just what exactly the alternative "Optimize Output for"
options really accomplish (Speed, Portability, Archive, ... ???).

ken's other recommendations, to use OOo or even a newer MS Office (with PDF
output capabilities built in), or, best of all, FrameMaker, are surely all sound,
as is Robert's idea (seconded by Peter) to use TeX (in particular, LaTeX);
but, realistically, while we'd love to be using FrameMaker, we can't affort it;
and we've got enough time invested in understanding our older Word 2000 that
we'd hate to scrap that all for a fresh, long learning tussle with Word 2007
or 2010. That leaves us with these alternatives: to switch to OOo, perhaps not
such a bad idea, after all; to stay with Word 2000 as "the Devil we know", or
to go the (non-WYSIWYG) LaTeX route, which will require us carefully and
systematically to frame or formulate *all* our assorted Style Defs (which,
currently, we've only accumulated informally, using Copy/Paste for Styles
as needed, when transferring a standard style onto a new fragment requiring
that style).

In fact, for the current project, we *did* just carry on with stiff upper lip, as
foreseen earlier: the PDFs are out with our PoD service provider as I write this,
with proofs due later this week.

For future projects, perhaps, the OpenOffice approach, or LaTeX, may be
a better way for us to go.

At all events, I'm most grateful for the thoughtful analyses and suggestions
you've all offered: thank you all! And cheers,

-- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/16/2011 5:38:45 AM
On 16/02/11 05:38, tlvp wrote:
[...]
> ken's other recommendations, to use OOo or even a newer MS Office
> (with PDF output capabilities built in), or, best of all, FrameMaker,
> are surely all sound, as is Robert's idea (seconded by Peter) to use
> TeX (in particular, LaTeX); but, realistically, while we'd love to
> be using FrameMaker, we can't affort it;

That is one very common reason for using LaTeX: it's free (and there are 
commercial versions if you want commercial support).

> and we've got enough time invested in understanding our older Word
> 2000 that we'd hate to scrap that all for a fresh, long learning
> tussle with Word 2007 or 2010.

Another common reason for using LaTeX: stability. It doesn't change 
interface with every release. Apart from some changes from v2.09 over a 
decade ago, all my LaTeX documents work and typeset exactly the same 
with the 2010 version of the code as they did when they were written.

> That leaves us with these alternatives: to switch to OOo, perhaps not
> such a bad idea, after all;

Recent tweets from leaders in the Open Source field indicate that they 
may leave OpenOffice for LibreOffice, because Oracle are dragging their 
feet over OO.

> to stay with Word 2000 as "the Devil we know", or to go the
> (non-WYSIWYG) LaTeX route,

This is a common misconception. LaTeX document display is WYSIWYG (eg 
Acrobat Reader, or whatever other PDF or DVI reader you use): it's your 
choice of editing software that determines your editing interface.

There are at least four synchronous typographic interfaces: LyX 
(multiplatform), Scientific Word and BaKoMa (both Windows), and Textures 
(Mac).

> which will require us carefully and systematically to frame or
> formulate *all* our assorted Style Defs (which, currently, we've only
> accumulated informally, using Copy/Paste for Styles as needed, when
> transferring a standard style onto a new fragment  requiring that style).

You can do this quite easily with LaTeX document classes and packages, 
incorporating your own tailored variants as local packages, so they 
become just as much "plug-in" as copying/pasting a Word style.

> In fact, for the current project, we *did* just carry on with stiff
> upper lip, as foreseen earlier: the PDFs are out with our PoD
> service provider as I write this, with proofs due later this week.
> For future projects, perhaps, the OpenOffice approach, or LaTeX, may
> be a better way for us to go.

I'd be happy to talk this through offline if you want.

///Peter

0
Peter
2/17/2011 10:43:20 PM
On 16/02/11 05:38, tlvp wrote:
[...]
I wrote:
> There are at least four synchronous typographic interfaces: LyX
> (multiplatform), Scientific Word and BaKoMa (both Windows), and
> Textures (Mac).

I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
Textures appears to be dead: the BlueSky web site is still there, 
though, and the people involved are apparently still around.

///Peter
0
Peter
2/18/2011 12:39:08 AM
At Fri, 18 Feb 2011 00:39:08 +0000 Peter Flynn <peter@silmaril.ie> wrote:

> 
> On 16/02/11 05:38, tlvp wrote:
> [...]
> I wrote:
> > There are at least four synchronous typographic interfaces: LyX
> > (multiplatform), Scientific Word and BaKoMa (both Windows), and
> > Textures (Mac).
> 
> I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
> Textures appears to be dead: the BlueSky web site is still there, 
> though, and the people involved are apparently still around.

One should note that not all of the 'WYSIWYG' systems for LaTeX are
'just like' Word Processors, in that not all of them are editors that
typeset *while you edit*.  For someone expecting a MS-Word sort of
experience, they are not really going to get it.  OTOH, LaTeX does
present a repeatable (across all printers, platforms, etc.) concrete
typesetting of the document -- this is actually something that MS-Word
*does not do*.  When you move a MS-Word .doc file from one machine to
another you can get *different* results -- all that is needed to change
the output is a different default printer!

> 
> ///Peter
>                                                                    

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                    
0
Robert
2/18/2011 4:41:57 AM
On Thu, 17 Feb 2011 23:41:57 -0500, Robert Heller <heller@deepsoft.com> wrote:

> At Fri, 18 Feb 2011 00:39:08 +0000 Peter Flynn <peter@silmaril.ie> wrote:
>
>>
>> On 16/02/11 05:38, tlvp wrote:
>> [...]
>> I wrote:
>> > There are at least four synchronous typographic interfaces: LyX
>> > (multiplatform), Scientific Word and BaKoMa (both Windows), and
>> > Textures (Mac).
>>
>> I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
>> Textures appears to be dead: the BlueSky web site is still there,
>> though, and the people involved are apparently still around.
>
> One should note that not all of the 'WYSIWYG' systems for LaTeX are
> 'just like' Word Processors, in that not all of them are editors that
> typeset *while you edit*.  For someone expecting a MS-Word sort of
> experience, they are not really going to get it.

Quite so. Using LaTeX is like writing HTML -- you do your editing in
one window, at the level of HTML, and you see your HTML programming, if
I may call it that, or your LaTeX programming; in another window -- a
browser window, for HTML, an AcroRead window, perhaps, for LaTeX -- you
get to see the results of your programming. Word or OOo or LO show you
directly (and without any programming language intermediaries) your output.

> ... OTOH, LaTeX does
> present a repeatable (across all printers, platforms, etc.) concrete
> typesetting of the document -- this is actually something that MS-Word
> *does not do*.  When you move a MS-Word .doc file from one machine to
> another you can get *different* results -- all that is needed to change
> the output is a different default printer!

Heh-heh -- for visibly different output it's enough to change to a different
*current* printer -- or not even: I had one short business letter once that
changed its line break behavior based on whether the *paper* was set to
Letter or Legal (yup, 8.5" x whatever, 0.75" margins left & right, hence
7.0" line-widths; two paragraphs of about eight lines each (plus opening
and closing salutations) -- fit easily on a letter-sized sheet). With Letter
paper chosen, the second paragraph ended in a one-word orphan line, while with
Legal, that one word remained tacked on at the right-hand end of the line before.

You're right -- I couldn't believe my own eyes.

>> ///Peter

Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/19/2011 12:09:38 AM
On Thu, 17 Feb 2011 17:43:20 -0500, Peter Flynn <peter@silmaril.ie> wrote:

> On 16/02/11 05:38, tlvp wrote:
> [...]
>> ken's other recommendations, to use OOo or even a newer MS Office
>> (with PDF output capabilities built in), or, best of all, FrameMaker,
>> are surely all sound, as is Robert's idea (seconded by Peter) to use
>> TeX (in particular, LaTeX); but, realistically, while we'd love to
>> be using FrameMaker, we can't affort it;
>
> That is one very common reason for using LaTeX: it's free (and there are
> commercial versions if you want commercial support).
>
>> and we've got enough time invested in understanding our older Word
>> 2000 that we'd hate to scrap that all for a fresh, long learning
>> tussle with Word 2007 or 2010.
>
> Another common reason for using LaTeX: stability. It doesn't change
> interface with every release. Apart from some changes from v2.09 over a
> decade ago, all my LaTeX documents work and typeset exactly the same
> with the 2010 version of the code as they did when they were written.
>
>> That leaves us with these alternatives: to switch to OOo, perhaps not
>> such a bad idea, after all;
>
> Recent tweets from leaders in the Open Source field indicate that they
> may leave OpenOffice for LibreOffice, because Oracle are dragging their
> feet over OO.
>
>> to stay with Word 2000 as "the Devil we know", or to go the
>> (non-WYSIWYG) LaTeX route,
>
> This is a common misconception. LaTeX document display is WYSIWYG (eg
> Acrobat Reader, or whatever other PDF or DVI reader you use): it's your
> choice of editing software that determines your editing interface.
>
> There are at least four synchronous typographic interfaces: LyX
> (multiplatform), Scientific Word and BaKoMa (both Windows), and Textures
> (Mac).
>
>> which will require us carefully and systematically to frame or
>> formulate *all* our assorted Style Defs (which, currently, we've only
>> accumulated informally, using Copy/Paste for Styles as needed, when
>> transferring a standard style onto a new fragment  requiring that style).
>
> You can do this quite easily with LaTeX document classes and packages,
> incorporating your own tailored variants as local packages, so they
> become just as much "plug-in" as copying/pasting a Word style.

Well, we've not *formally* defined *any* styles for Word. We've just
nudged and prodded and set point size and set inter-line leading (via
Alt-O [Format] options) and set inter-character Condensedness, and the
like, in certain well-defined contexts -- these un-named styles Word
evidently keeps track of, somehow, and we exploit that fact.

We'd be hard-pressed to reconstruct all the details of those styles,
though, or to compile just what all the different styles we've amassed are.

>> In fact, for the current project, we *did* just carry on with stiff
>> upper lip, as foreseen earlier: the PDFs are out with our PoD
>> service provider as I write this, with proofs due later this week.
>> For future projects, perhaps, the OpenOffice approach, or LaTeX, may
>> be a better way for us to go.
>
> I'd be happy to talk this through offline if you want.

If riding herd on all the styles we've inadvertently wound up creating
is something you'd really care to try to help us with, we'd welcome the
help ... but not for another month or so, what with the current project
still going down the home stretch :-) .

> ///Peter

Thanks for that offer, though :-) ; and cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/19/2011 12:21:08 AM
On 19/02/11 00:09, tlvp wrote:
> On Thu, 17 Feb 2011 23:41:57 -0500, Robert Heller <heller@deepsoft.com>
> wrote:
>
>> At Fri, 18 Feb 2011 00:39:08 +0000 Peter Flynn <peter@silmaril.ie> wrote:
>>
>>>
>>> On 16/02/11 05:38, tlvp wrote:
>>> [...]
>>> I wrote:
>>> > There are at least four synchronous typographic interfaces: LyX
>>> > (multiplatform), Scientific Word and BaKoMa (both Windows), and
>>> > Textures (Mac).
>>>
>>> I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
>>> Textures appears to be dead: the BlueSky web site is still there,
>>> though, and the people involved are apparently still around.
>>
>> One should note that not all of the 'WYSIWYG' systems for LaTeX are
>> 'just like' Word Processors, in that not all of them are editors that
>> typeset *while you edit*. For someone expecting a MS-Word sort of
>> experience, they are not really going to get it.
>
> Quite so. Using LaTeX is like writing HTML -- you do your editing in
> one window, at the level of HTML, and you see your HTML programming, if
> I may call it that, or your LaTeX programming; in another window -- a
> browser window, for HTML, an AcroRead window, perhaps, for LaTeX -- you
> get to see the results of your programming.

That's pretty much it. The reason being, of course, that there are 
things LaTeX (and to some extent, HTML) can do that are impossible to 
represent graphically, such as your cursor location between two states, 
both of which have identical typographical representations. See my blog 
for a real-life example (http://blogs.silmaril.ie/peter/#fileformats)

> Word or OOo or LO show you
> directly (and without any programming language intermediaries) your output.

Not true at all, I'm afraid. There is a *lot* of programming *and* 
markup between your finger and the font on the screen; it's just hidden. 
And if you think LaTeX or HTML are "complicated" or "hard" (more myths 
demolished at 
http://latex.silmaril.ie/formattinginformation/preface.html#myths2), go 
and create a "Hello, World" .docx file (Word or OO will do), then exit 
the application, and open the .docx file with a Zip program (WinZip or 
equivalent), which will work because that's all that .docx and ODF files 
are. Look in there for a file called document.xml and examine it closely 
to find your "Hello, World" text. *That*'s the "programming language 
intermediaries" you are dealing with when you use Word.

///Peter
0
Peter
2/19/2011 7:13:08 PM
On Sat, 19 Feb 2011 14:13:08 -0500, Peter Flynn <peter@silmaril.ie> wrote:

> On 19/02/11 00:09, tlvp wrote:
>> On Thu, 17 Feb 2011 23:41:57 -0500, Robert Heller <heller@deepsoft.com>
>> wrote:
>>
>>> At Fri, 18 Feb 2011 00:39:08 +0000 Peter Flynn <peter@silmaril.ie> wrote:
>>>
>>>>
>>>> On 16/02/11 05:38, tlvp wrote:
>>>> [...]
>>>> I wrote:
>>>> > There are at least four synchronous typographic interfaces: LyX
>>>> > (multiplatform), Scientific Word and BaKoMa (both Windows), and
>>>> > Textures (Mac).
>>>>
>>>> I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
>>>> Textures appears to be dead: the BlueSky web site is still there,
>>>> though, and the people involved are apparently still around.
>>>
>>> One should note that not all of the 'WYSIWYG' systems for LaTeX are
>>> 'just like' Word Processors, in that not all of them are editors that
>>> typeset *while you edit*. For someone expecting a MS-Word sort of
>>> experience, they are not really going to get it.
>>
>> Quite so. Using LaTeX is like writing HTML -- you do your editing in
>> one window, at the level of HTML, and you see your HTML programming, if
>> I may call it that, or your LaTeX programming; in another window -- a
>> browser window, for HTML, an AcroRead window, perhaps, for LaTeX -- you
>> get to see the results of your programming.
>
> That's pretty much it. The reason being, of course, that there are
> things LaTeX (and to some extent, HTML) can do that are impossible to
> represent graphically, such as your cursor location between two states,
> both of which have identical typographical representations. See my blog
> for a real-life example (http://blogs.silmaril.ie/peter/#fileformats)
>
>> Word or OOo or LO show you
>> directly (and without any programming language intermediaries) your output.
>
> Not true at all, I'm afraid.

Of course not. Sorry, I was too condensed. Meant: "without any programming
language intermediaries I am conscious (as I would be of HTML or LaTeX files)
of having created." You are certainly correct, that ...

> ... There is a *lot* of programming *and*
> markup between your finger and the font on the screen; it's just hidden.
> And if you think LaTeX or HTML are "complicated" or "hard" ...

Actually, no I don't. I write HTML directly, have done so for some years.
I don't write TeX, but only for never having bothered to learn the rudiments.

> ... (more myths demolished at
> http://latex.silmaril.ie/formattinginformation/preface.html#myths2), go
> and create a "Hello, World" .docx file (Word or OO will do), then exit
> the application, and open the .docx file with a Zip program (WinZip or
> equivalent), which will work because that's all that .docx and ODF files
> are. Look in there for a file called document.xml and examine it closely
> to find your "Hello, World" text. *That*'s the "programming language
> intermediaries" you are dealing with when you use Word.

Now why had I never before become aware of that fact? I used to open
pre-Word-2007 files in WordPad to see 'em quick and dirty, a strategy
that proves useless for .docx and .odf files ... because they're ZIPped?
(slaps forehead) Oh! Why'n't anyone else ever say so?

Thanks for that pointer, Peter, that'll feed me for a good month or more!

Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP

> ///Peter
0
tlvp
2/20/2011 6:43:38 AM
On 20/02/11 06:43, tlvp wrote:
[...]
> Now why had I never before become aware of that fact? I used to open
> pre-Word-2007 files in WordPad to see 'em quick and dirty, a
> strategy that proves useless for .docx and .odf files ... because
> they're ZIPped? (slaps forehead) Oh! Why'n't anyone else ever say
> so?

ODF was first, microsoft followed. I think it just seemed convenient 
because the contents were all plaintext files: the XML document, 
stylesheets, lookups, schemas; except image binaries, of course, which 
was another benefit -- in .doc files they are relatively inaccessible, 
and in WordML (2003) they were encoded to make them go *in* the XML; now 
they are just included in the .docx zip in a subdirectory.

Now...if only Adobe would move to a similar format (called, let us 
suppose, .pdfx :-) so that all files in there would be plaintext.

Oh, whoops...it's called Postscript...

///Peter
0
Peter
2/20/2011 1:04:00 PM
At Sun, 20 Feb 2011 01:43:38 -0500 tlvp <tPlOvUpBErLeLsEs@hotmail.com> wrote:

> 
> On Sat, 19 Feb 2011 14:13:08 -0500, Peter Flynn <peter@silmaril.ie> wrote:
> 
> > On 19/02/11 00:09, tlvp wrote:
> >> On Thu, 17 Feb 2011 23:41:57 -0500, Robert Heller <heller@deepsoft.com>
> >> wrote:
> >>
> >>> At Fri, 18 Feb 2011 00:39:08 +0000 Peter Flynn <peter@silmaril.ie> wrote:
> >>>
> >>>>
> >>>> On 16/02/11 05:38, tlvp wrote:
> >>>> [...]
> >>>> I wrote:
> >>>> > There are at least four synchronous typographic interfaces: LyX
> >>>> > (multiplatform), Scientific Word and BaKoMa (both Windows), and
> >>>> > Textures (Mac).
> >>>>
> >>>> I should have added GNU TeXmacs (Unix, GNU/Linux, Windows, maybe Mac)
> >>>> Textures appears to be dead: the BlueSky web site is still there,
> >>>> though, and the people involved are apparently still around.
> >>>
> >>> One should note that not all of the 'WYSIWYG' systems for LaTeX are
> >>> 'just like' Word Processors, in that not all of them are editors that
> >>> typeset *while you edit*. For someone expecting a MS-Word sort of
> >>> experience, they are not really going to get it.
> >>
> >> Quite so. Using LaTeX is like writing HTML -- you do your editing in
> >> one window, at the level of HTML, and you see your HTML programming, if
> >> I may call it that, or your LaTeX programming; in another window -- a
> >> browser window, for HTML, an AcroRead window, perhaps, for LaTeX -- you
> >> get to see the results of your programming.
> >
> > That's pretty much it. The reason being, of course, that there are
> > things LaTeX (and to some extent, HTML) can do that are impossible to
> > represent graphically, such as your cursor location between two states,
> > both of which have identical typographical representations. See my blog
> > for a real-life example (http://blogs.silmaril.ie/peter/#fileformats)
> >
> >> Word or OOo or LO show you
> >> directly (and without any programming language intermediaries) your output.
> >
> > Not true at all, I'm afraid.
> 
> Of course not. Sorry, I was too condensed. Meant: "without any programming
> language intermediaries I am conscious (as I would be of HTML or LaTeX files)
> of having created." You are certainly correct, that ...
> 
> > ... There is a *lot* of programming *and*
> > markup between your finger and the font on the screen; it's just hidden.
> > And if you think LaTeX or HTML are "complicated" or "hard" ...
> 
> Actually, no I don't. I write HTML directly, have done so for some years.
> I don't write TeX, but only for never having bothered to learn the rudiments.
> 
> > ... (more myths demolished at
> > http://latex.silmaril.ie/formattinginformation/preface.html#myths2), go
> > and create a "Hello, World" .docx file (Word or OO will do), then exit
> > the application, and open the .docx file with a Zip program (WinZip or
> > equivalent), which will work because that's all that .docx and ODF files
> > are. Look in there for a file called document.xml and examine it closely
> > to find your "Hello, World" text. *That*'s the "programming language
> > intermediaries" you are dealing with when you use Word.
> 
> Now why had I never before become aware of that fact? I used to open
> pre-Word-2007 files in WordPad to see 'em quick and dirty, a strategy
> that proves useless for .docx and .odf files ... because they're ZIPped?
> (slaps forehead) Oh! Why'n't anyone else ever say so?

You won't see much that makes much sense when you open the XML file(s) in the
.docx file (I am not sure what the XML for a .odf looks like). 

> 
> Thanks for that pointer, Peter, that'll feed me for a good month or more!
> 
> Cheers, -- tlvp

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                  
0
Robert
2/20/2011 1:14:02 PM
On 2011-02-20 14:04, "Peter Flynn" wrote:

> [...]
> 
> Now...if only Adobe would move to a similar format (called, let us 
> suppose, .pdfx :-) so that all files in there would be plaintext.

Isn't PDF "just" a container format with multiple compressed streams
within? AFAIK at least in rather recent versions _any_ entity can be
compressed. (Older versions of the Adobe Reader tend to declare those
documents simply "corrupted".)

> Oh, whoops...it's called Postscript...

Well, embedding fonts isn't _strictly_ "plain text" ...

| /ABCDEF+OptimaLT
| << /FontType 1
|    /FontMatrix [0.001 0 0 0.001 0 0]
|    /Encoding ISOLatin1Encoding
|    /FontBBox [0 0 0 0]
|    /PaintType 0
|    /Private
|    << /|- {def}
|       /| {put}
|       /BlueValues [0 0]
|       /Password 5839
|       /MinFeature {16 16}
|       /Subrs
|          [<1C60D8A8CC31FE2BF6E07AA3E541E2> <1C60D8A8C9C3D06D9E>
|           <1C60D8A8C9C202D79A> <1C60D8A849>
|           <1C60D8A8CC3674F41144B13B77>]
|       /OtherSubrs
|          [{} {} {} {systemdict /internaldict known not {pop 3}
|           {1183615869 systemdict /internaldict get exec dup
|           /startlock known {/startlock get exec} {dup
|           /strtlck known {/strtlck get exec} {pop 3} ifelse} ifelse}
|           ifelse} executeonly]
|    >>
|    /CharStrings
|    << /.notdef
|          <1C60D8A8C9B854D00D>
|    >>
| >> definefont

Michael

-- 
Real names enhance the probability of getting real answers.
My e-mail account at DECUS Munich is no longer valid.
0
Michael
2/20/2011 1:46:04 PM
On 20/02/11 13:46, Michael Unger wrote:
> On 2011-02-20 14:04, "Peter Flynn" wrote:
>
>> [...]
>>
>> Now...if only Adobe would move to a similar format (called, let us
>> suppose, .pdfx :-) so that all files in there would be plaintext.
>
> Isn't PDF "just" a container format with multiple compressed streams
> within? AFAIK at least in rather recent versions _any_ entity can be
> compressed. (Older versions of the Adobe Reader tend to declare those
> documents simply "corrupted".)

Yes, it's a horrible mess. Entities can be compressed and included, or 
just included, but there is also the original PDF material as-was before 
they added streams...

>> Oh, whoops...it's called Postscript...
>
> Well, embedding fonts isn't _strictly_ "plain text" ...

Yes it is, provided it's (well, was) ASCII. "Plain Text" has become a 
misnomer nowadays -- any UTF-8 non-control character is plain text for 
someone, in some language. By "plaintext" I suspect we should mean one 
8-bit byte per character provided it's displayable and printable as a 
recognisable glyph. Or something :-)

///Peter

0
Peter
2/20/2011 2:49:29 PM
On 20/02/11 13:14, Robert Heller wrote:
[...]
> You won't see much that makes much sense when you open the XML file(s) in the
> .docx file (I am not sure what the XML for a .odf looks like).

On the contrary, it's all there and visible, even if some of it is 
rather cryptic. It's unreadable because there is no white-space in 
element content (deliberately), but if you open it in something like a 
browser, or a utility like xpathtester, which rearranges it into a tree, 
it's not hard to see where they have put stuff.

///Peter

0
Peter
2/20/2011 2:51:29 PM
At Sun, 20 Feb 2011 14:51:29 +0000 Peter Flynn <peter@silmaril.ie> wrote:

> 
> On 20/02/11 13:14, Robert Heller wrote:
> [...]
> > You won't see much that makes much sense when you open the XML file(s) in the
> > .docx file (I am not sure what the XML for a .odf looks like).
> 
> On the contrary, it's all there and visible, even if some of it is 
> rather cryptic. It's unreadable because there is no white-space in 
> element content (deliberately), but if you open it in something like a 
> browser, or a utility like xpathtester, which rearranges it into a tree, 
> it's not hard to see where they have put stuff.

Yes, but it is not anything close to something like a readable document.

When you open a LaTeX source file, 90% of it is pretty much what the
document would look like if typed on a typewriter.  The 10% or so of
LaTeX 'markup' is itself composed of English words, most with obvious
meaning (eg \section, \footnote, etc.). A typical non-geek, with a
small bit effort will get the idea of what is going on. Maybe not a
hairy bit of math or some sort of fancy games with a tabular
environment or something, but most of the time it really is pretty
obvious what is going on, even without knowing anything about LaTeX.

This is one of the great things about LaTeX.

> 
> ///Peter
> 
>                                        

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                                                                           
0
Robert
2/20/2011 8:37:56 PM
On Sun, 20 Feb 2011 08:04:00 -0500, Peter Flynn <peter@silmaril.ie> wrote:

> On 20/02/11 06:43, tlvp wrote:
> [...]
>> Now why had I never before become aware of that fact? I used to open
>> pre-Word-2007 files in WordPad to see 'em quick and dirty, a
>> strategy that proves useless for .docx and .odf files ... because
>> they're ZIPped? (slaps forehead) Oh! Why'n't anyone else ever say
>> so?
>
> ODF was first, microsoft followed. I think it just seemed convenient
> because the contents were all plaintext files: the XML document,
> stylesheets, lookups, schemas; except image binaries, of course, which
> was another benefit -- in .doc files they are relatively inaccessible,
> and in WordML (2003) they were encoded to make them go *in* the XML; now
> they are just included in the .docx zip in a subdirectory.
>
> Now...if only Adobe would move to a similar format (called, let us
> suppose, .pdfx :-) so that all files in there would be plaintext.
>
> Oh, whoops...it's called Postscript...

Now wait. I *do* write (just a little) Postscript, all in the simplest of
plain text editors, and there's nothing ZIPped about anything I've done:
so what exactly are you driving at here?

>
> ///Peter
>

Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/20/2011 9:20:29 PM
On 20/02/11 21:20, tlvp wrote:
> On Sun, 20 Feb 2011 08:04:00 -0500, Peter Flynn <peter@silmaril.ie> wrote:
[...]
>> Oh, whoops...it's called Postscript...
>
> Now wait. I *do* write (just a little) Postscript, all in the simplest of
> plain text editors, and there's nothing ZIPped about anything I've done:
> so what exactly are you driving at here?

My point exactly...if they had wanted a plaintext format so badly, there 
is Postscript. Being plaintext, PS zips very nicely (90%+ if you're lucky).

///Peter
0
Peter
2/22/2011 8:38:14 PM
On 20/02/11 20:37, Robert Heller wrote:
[...]
> Yes, but it is not anything close to something like a readable
> document.

Readable only with extreme difficulty. It is at least plain text: the
obsolete binary .doc file was quite literally unreadable.

> When you open a LaTeX source file, 90% of it is pretty much what the
> document would look like if typed on a typewriter.  The 10% or so of
> LaTeX 'markup' is itself composed of English words, most with
> obvious meaning (eg \section, \footnote, etc.). A typical non-geek,
> with a small bit effort will get the idea of what is going on. Maybe
> not a hairy bit of math or some sort of fancy games with a tabular
> environment or something, but most of the time it really is pretty
> obvious what is going on, even without knowing anything about LaTeX.
>
> This is one of the great things about LaTeX.

Yes, although its converse (that the formatting is implicit, being
"hidden" in the macros, rather than explicit) is what puts people off.

They see a blank line separating paragraphs in the input, but no blank
line separating them in the output. They see lines being arbitrarily
folded or wrapped -- or not, as the case may be -- and bearing no
obvious relationship to how they are formatted in the output. They see 
figures and tables appearing in different places to their location in 
the input...

Most newcomers, having had their critical faculties removed by remote
surgery performed by Microsoft, Adobe, Oracle, Sun, etc, are shocked by
this apparent lack of any relationship between input and output, no
matter how much they may grasp intellectually the concept of logic-based
formatting.

///Peter
0
Peter
2/22/2011 8:47:58 PM
On Tue, 22 Feb 2011 15:38:14 -0500, Peter Flynn <peter@silmaril.ie> wrote:

> On 20/02/11 21:20, tlvp wrote:
>> On Sun, 20 Feb 2011 08:04:00 -0500, Peter Flynn <peter@silmaril.ie> wrote:
> [...]
>>> Oh, whoops...it's called Postscript...
>>
>> Now wait. I *do* write (just a little) Postscript, all in the simplest of
>> plain text editors, and there's nothing ZIPped about anything I've done:
>> so what exactly are you driving at here?
>
> My point exactly...if they had wanted a plaintext format so badly, there
> is Postscript. Being plaintext, PS zips very nicely (90%+ if you're lucky).
>
> ///Peter

Ach, sorry, I had misread you, understood you to be conveying that Postscript
requires a ZIPped collection of plaintext files. Obviously *mis*-understood;
sorry, forget I asked :-) .

Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/22/2011 8:55:56 PM
On Tue, 22 Feb 2011 15:47:58 -0500, Peter Flynn <peter@silmaril.ie> wrote:

> On 20/02/11 20:37, Robert Heller wrote:
> [...]
>> Yes, but it is not anything close to something like a readable
>> document.
>
> Readable only with extreme difficulty. It is at least plain text: the
> obsolete binary .doc file was quite literally unreadable.
>
>> When you open a LaTeX source file, 90% of it is pretty much what the
>> document would look like if typed on a typewriter.  The 10% or so of
>> LaTeX 'markup' is itself composed of English words, most with
>> obvious meaning (eg \section, \footnote, etc.). A typical non-geek,
>> with a small bit effort will get the idea of what is going on. Maybe
>> not a hairy bit of math or some sort of fancy games with a tabular
>> environment or something, but most of the time it really is pretty
>> obvious what is going on, even without knowing anything about LaTeX.
>>
>> This is one of the great things about LaTeX.
>
> Yes, although its converse (that the formatting is implicit, being
> "hidden" in the macros, rather than explicit) is what puts people off.
>
> They see a blank line separating paragraphs in the input, but no blank
> line separating them in the output. They see lines being arbitrarily
> folded or wrapped -- or not, as the case may be -- and bearing no
> obvious relationship to how they are formatted in the output. They see
> figures and tables appearing in different places to their location in
> the input...
>
> Most newcomers, having had their critical faculties removed by remote
> surgery performed by Microsoft, Adobe, Oracle, Sun, etc, are shocked by
> this apparent lack of any relationship between input and output, no
> matter how much they may grasp intellectually the concept of logic-based
> formatting.
>
> ///Peter

Heh-heh ... waxing autobiographical for a moment, you'd be surprised how
experience doing page layout by scissors-n-paste work with actual pieces
of paper (back in the early '50s hot lead slug Linotype days of the last
century) also molds one's expectations of how input yields output :-) .

Cheers, -- tlvp



-- 
Avant de repondre, jeter la poubelle, SVP
0
tlvp
2/22/2011 9:03:37 PM
Reply: