f



how to display utf-8 characters

Hi PostScript gurus,

Sorry for I did not follow this newsgroup for a long time. For sure,
someone should have already answered to this question.

Well, I am trying to display UTF-8 characters and I cannot find a good
solution.

In a first try, I used a CMAP with an explicit (utf_code to glyphname)
array which was put inside beginbfchar/endbfchar and I redefine the
fonts by 'composefont'. ghostscript displays correctly, but
translating to pdf (by ps2pdf) or printing fails.

In a second try, I defined the glyph names in an array indexed by
unicode values and I defined a new show function which scans the utf-8
characters, builds the unicode values, gets the glyph names and calls
'glyphshow'. This works fine (display, ps2pdf) and it should also work
for printing.

The advantage of this second solution is that PostScript level 1 (RoPS)
may be used, but the glyph array is big, and the character treatment is
heavy.

So, what is the right way to treat the UTF-8 encoding?

Cheers.

--=20
Ken ar c'henta=C3=B1	|	      ** Breizh ha Linux atav! **
Jef		|		http://moinejf.free.fr/

0
moinejf (2)
1/24/2010 8:08:02 PM
comp.lang.postscript 3552 articles. 0 followers. Post Follow

9 Replies
2613 Views

Similar Articles

[PageSpeed] 8

In article <20100124210802.47d45686@tele>, moinejf@free.fr says...

> Well, I am trying to display UTF-8 characters and I cannot find a good
> solution.
> 
> In a first try, I used a CMAP with an explicit (utf_code to glyphname)
> array which was put inside beginbfchar/endbfchar and I redefine the
> fonts by 'composefont'.

Hmm, you can't use a CMap with a regular font, and you can't use glyph 
names with a CIDFont. I presume what you mean is that you created a UTF-
8 CMap which maps the UTF-8 code points to the CIDs of the relevant 
glyphs in the CIDFont.


> ghostscript displays correctly, but
> translating to pdf (by ps2pdf) or printing fails.

Well, sounds like problem solved. GS displays the output correctly, 
isn't that what you wanted ?

What do you mean by saying that conversion to PDF and printing the 
PostScript file 'fails' ? Do you get an error, incorrect output, 
something else ? Did you raise a bug report ?


> In a second try, I defined the glyph names in an array indexed by
> unicode values and I defined a new show function which scans the utf-8
> characters, builds the unicode values, gets the glyph names and calls
> 'glyphshow'. This works fine (display, ps2pdf) and it should also work
> for printing.

This is bascically what the regular fonrt machinery does for you when 
you encode the font by supplying an Encoding array.

 
> So, what is the right way to treat the UTF-8 encoding?

Depends entirely on the font you plan to use. 

If your font is a regular type 1, 2, 3 or type 42 font then you can 
simply use an Encoding (not a CMap) which will map the UTF-8 code to a 
glyph name. Given a character code the font machinery consults the 
Encoding array to find what glyph name is at that position, then looks 
up the glyph name in the CharStrings dictionary and executes the glyph 
program associated with it.

If you are using a CIDFont, then you need a CMap which maps the 
character codes (UTF-8 code points) to CIDs. The CIDs need to be correct 
for the font you are using, which is governed by the Registry and 
Oredering of the CIDFont. Its a similar process to the one above, but 
more complicated; CMaps can encode multiple bytes, which Encodings 
cannot and there is no standard set of CIDs so you need to know the CIDs 
of the glyphs before you create a CMap, and a CMap for a specific 
Registty and Ordering cna't be used for a font with a different Registry 
and Ordering.


Without much more information (font type as a bare minimum) it's not 
possible to give you any further advice.


		Ken
0
ken
1/25/2010 8:11:25 AM
On Mon, 25 Jan 2010 08:11:25 -0000
ken <ken@spamcop.net> wrote:

> In article <20100124210802.47d45686@tele>, moinejf@free.fr says...
	[snip]
> What do you mean by saying that conversion to PDF and printing the=20
> PostScript file 'fails' ? Do you get an error, incorrect output,=20
> something else ? Did you raise a bug report ?

Sorry, it was a bug of mine! I wanted to add the music double sharp and
flat, and it seems there is a problem with the CharStrings: string
display stopped after these characters in pdf file and on printer.
Maybe I will ask you for that later.

	[snip]
> > So, what is the right way to treat the UTF-8 encoding?
>=20
> Depends entirely on the font you plan to use.=20
>=20
> If your font is a regular type 1, 2, 3 or type 42 font then you can=20
> simply use an Encoding (not a CMap) which will map the UTF-8 code to
> a glyph name. Given a character code the font machinery consults the=20
> Encoding array to find what glyph name is at that position, then
> looks up the glyph name in the CharStrings dictionary and executes
> the glyph program associated with it.

Yes, but the Encoding is limited to 256 characters, and UTF-8
characters are coded in one to 4 bytes.

> If you are using a CIDFont, then you need a CMap which maps the=20
> character codes (UTF-8 code points) to CIDs. The CIDs need to be
> correct for the font you are using, which is governed by the Registry
> and Oredering of the CIDFont. Its a similar process to the one above,
> but more complicated; CMaps can encode multiple bytes, which
> Encodings cannot and there is no standard set of CIDs so you need to
> know the CIDs of the glyphs before you create a CMap, and a CMap for
> a specific Registty and Ordering cna't be used for a font with a
> different Registry and Ordering.

I think that for these CIDFonts a CMap must exist. I use such CMaps for
chinese fonts (/UniGB-UTF8-H...).

> Without much more information (font type as a bare minimum) it's not=20
> possible to give you any further advice.

Well, I am develop a music typesetting program. There may be some text
in the music sheets (words, comments..) and in any language. The people
who create the music sheet may use any font they want. In the previous
version of the programe, only the latin encodings 1 to 6 where treated,
but I have many requests for multi-language.

Here are shorten versions of the solutions I found for utf-8. Which do
you think is the more flexible and/or quickest?

%--------- first version - CMAP -----------
%!PS

/utf-array [
<20>	/space
<21>	/exclam
<22>	/quotedbl
<23>	/numbersign
<24>	/dollar
% ...
<40>	/at
<41>	/A
<42>	/B
<43>	/C
<44>	/D
<45>	/E
<46>	/F
% ...
<c380>	/Agrave
<c381>	/Aacute
<c382>	/Acircumflex
<c383>	/Atilde
<c384>	/Adieresis
<c385>	/Aring
<c386>	/AE
<c387>	/Ccedilla
<c388>	/Egrave
% ...
] def

/CIDInit /ProcSet findresource begin
    12 dict begin
	begincmap
		/CMapName /CMAP-UTF8 def
		/CMapType 1 def
		/CIDSystemInfo [ null ] def
		/WMode 0 def
		2 begincodespacerange
			  <00>       <7F>
			  <C280>     <C4BF>
		endcodespacerange

		utf-array length 2 idiv
		beginbfchar
			utf-array aload pop
		endbfchar
	endcmap
	CMapName currentdict /CMap defineresource pop
    end
end

/Courier-UTF8 /CMAP-UTF8 [ /Helvetica ] composefont pop

/Courier-UTF8 12 selectfont

18 800 moveto (ABC =C3=83=C3=87=C3=88) show

showpage

%--------- second version - utfshow function -----------
%!PS

% table indexed by (unicode - 128)
/unitb [
/nbspace=20
/c129=20
/SC040000
/SC020000
/SC010000
/SC050000
/SM650000
/SM240000
/diaeresis=20
/SM520000
/SM210000
/SP170000
/SM660000
/sfthyphen=20
/c142=20
/overstore
/SM190000
/SA020000
/ND021000
/ND031000
/SD110000
/micro=20
/SM250000
/middot
/SD410000
/ND011000
/SM200000
/SP180000
/NF040000
/NF010000
/NF050000
/SP160000
/LA140000
/exclamdown=20
/cent=20
/sterling=20
/currency=20
/yen=20
/brokenbar=20
/section=20
/dieresis=20
/copyright=20
/ordfeminine=20
/guillemotleft=20
/logicalnot=20
/LI120000
/registered=20
/LI180000
/degree
/plusminus=20
/twosuperior=20
/threesuperior=20
/acute=20
/LO200000
/paragraph=20
/SA070000
/cedilla
/onesuperior=20
/ordmasculine=20
/guillemotright=20
/onequarter=20
/onehalf=20
/threequarters=20
/questiondown=20
/Agrave=20
/Aacute=20
/Acircumflex=20
/Atilde=20
/Adieresis=20
/Aring=20
/AE=20
/Ccedilla=20
/Egrave=20
% ...
] def

/utfshow {
	/c 0 def
	{
		dup 16#0080 ge{
			dup 16#00c0 ge{
				c 0 ne{
					c
					128 sub
					unitb exch get glyphshow
					/c 0 def
				}if
				16#000f and
				/c exch def
			}{
				16#003f and
				c 64 mul add
				/c exch def
			}ifelse
		}{
			c 0 ne{
				c
				128 sub
				unitb exch get glyphshow
				/c 0 def
			}if
			StandardEncoding exch get glyphshow
		}ifelse
	}forall
	c 0 ne{
		c
		128 sub
		unitb exch get glyphshow
	}if
} bind def

/Helvetica 12 selectfont

18 800 moveto (ABC =C3=83=C3=87=C3=88) utfshow

showpage

--=20
Ken ar c'henta=C3=B1	|	      ** Breizh ha Linux atav! **
Jef		|		http://moinejf.free.fr/

0
Jean
1/25/2010 11:45:14 AM
In article <20100125124632.6ca82e52@tele>, moinejf@free.fr says...

> > If your font is a regular type 1, 2, 3 or type 42 font then you can 
> > simply use an Encoding (not a CMap) which will map the UTF-8 code to
> > a glyph name. Given a character code the font machinery consults the 
> > Encoding array to find what glyph name is at that position, then
> > looks up the glyph name in the CharStrings dictionary and executes
> > the glyph program associated with it.
> 
> Yes, but the Encoding is limited to 256 characters, and UTF-8
> characters are coded in one to 4 bytes.

OK, but does the font contain all those glyphs ? If not you can develop 
a different approach which maps the UTF-8 character codes into one byte 
character codes that reference teh Encoding.

If the font does contain all the UTF-8 potential glyphs then it most 
likely is a CIDFont. so on we go:

> > If you are using a CIDFont, then you need a CMap which maps the 
> > character codes (UTF-8 code points) to CIDs. The CIDs need to be
> > correct for the font you are using, which is governed by the Registry
> > and Oredering of the CIDFont. Its a similar process to the one above,
> > but more complicated; CMaps can encode multiple bytes, which
> > Encodings cannot and there is no standard set of CIDs so you need to
> > know the CIDs of the glyphs before you create a CMap, and a CMap for
> > a specific Registty and Ordering cna't be used for a font with a
> > different Registry and Ordering.
> 
> I think that for these CIDFonts a CMap must exist. I use such CMaps for
> chinese fonts (/UniGB-UTF8-H...).

A CMap may be possible to construct, it doesn't mean it exists. You need 
to know which glyphs are represented by which CIDs (this depends on the 
Regostry and Ordering in the CIDFont). Given that, you can construct a 
CMap which maps the UTF-8 codes to the correct CIDs.

 
> > Without much more information (font type as a bare minimum) it's not 
> > possible to give you any further advice.
> 
> Well, I am develop a music typesetting program. There may be some text
> in the music sheets (words, comments..) and in any language. The people
> who create the music sheet may use any font they want. In the previous
> version of the programe, only the latin encodings 1 to 6 where treated,
> but I have many requests for multi-language.
> 
> Here are shorten versions of the solutions I found for utf-8. Which do
> you think is the more flexible and/or quickest?

Using the glyph names only works for type 1,2 or 3 fonts. Although these 
fonts are limited to an Encoding array of 256 elements, they may contain 
more glyphs than that, but only 256 at a time can be used with the show 
operator and friends. The glyphshow operator (as you've found) uses the 
glyph name (or CID for CIDFonts), and isn't limited to the Encoding 
array.

However, it will *not* work with CIDFonts, CIDFonts don't have glyph 
names, they have CIDs.

So I feel you must be using regular fonts at present, not CIDFonts. I'm 
amazed that GS doesn't generate an error for this, it should give you an 
error on composefont if you try to compose a non-CIDFont with a CMap, or 
on glyphshow if you try to use a glyph name as a CID. Going by your 
examples below, you must have Helvetica loaded as both a regular font 
*and* a CIDFont.


I'm afraid you can't just use 'any font' without a lot of work. Do you 
expect that these fonts will always contain the full UTF-8 character set 
? What will you do (what do you exepct to happen) if the font only 
contains the latin set and you ask for something in the upper ranges ?

Before you can create a working PostScript program utilising a font you 
*must* know what the font type is, at a bare minimum. If the font is a 
CIDFont then you require a CMap which is correct for that particular 
font. If you expect to be using any old font then you can't tell in 
advance what its Registry and Ordering will be, so you can't manufacture 
a CMap for it, you need to be supplied with one.


Note that you can create several different single byte encoded instances  
of a regular font, containing different character sets. So you could 
define your show procedure in such a way as to select a font encoded 
with a particular set of characters and then 'show' the string in that 
font. Obviously you would have to change font if you ran into the next 
set of encodings.

But I repeat, unless you know which glyphs are in the font you are 
using, and the type of the font, you are going to have a lot of trouble 
getting this to work at all.


If you really want to use 'any' font then your best bet is probably to 
proceed as you are, using glyphshow. If the named glyph is not present 
in the font then the /.notdef will be used instead. This will lead to 
missing characters in the output of course.

What do you plan to do for glyphs in (eg) Chinese or Japanese which do 
not have standardised PostScript glyph names ?



			Ken
0
ken
1/25/2010 6:49:51 PM
On Mon, 25 Jan 2010 18:49:51 -0000
ken <ken@spamcop.net> wrote:

> If you really want to use 'any' font then your best bet is probably
> to proceed as you are, using glyphshow. If the named glyph is not
> present in the font then the /.notdef will be used instead. This will
> lead to missing characters in the output of course.
>=20
> What do you plan to do for glyphs in (eg) Chinese or Japanese which
> do not have standardised PostScript glyph names ?

Thanks, Ken, for your explanations and advice. I better understand the
fonts now.

My music typesetting program lets the users to use any font, but, thanks
to customization, they may adapt the PostScript code to their needs.
The fonts declaration includes its encoding. Default is now utf-8, the
other encoding is simply 'native', in which case, there is no
redefinition of the font. The user is then responsible for defining the
PostScript code which permits a correct display. As an example, the
file chinese.abc included in the abcm2ps package contains:

- the font declaration

	%%font UKaiCN-UTF8-H native

- the PostScript creation of the font:

	%%beginps
	/UKaiCN-UTF8-H /UniGB-UTF8-H [ /UKaiCN ] composefont pop
	%%endps

About the number of characters in a font, some fonts have more than 1000
chars (DejaVu). On the other side, many people already use specific
fonts for their own language (cyrillic, greek, hebrew..). I will
furnish only the utf-8 to glyph table for european and a few other
languages in the distro and I will explain to the other people how to
add their own glyphs...

--=20
Ken ar c'henta=C3=B1	|	      ** Breizh ha Linux atav! **
Jef		|		http://moinejf.free.fr/

0
Jean
1/25/2010 7:49:26 PM
On 2010-01-25, ken <ken@spamcop.net> wrote:
> Using the glyph names only works for type 1,2 or 3 fonts. Although these 
> fonts are limited to an Encoding array of 256 elements, they may contain 
> more glyphs than that, but only 256 at a time can be used with the show 
> operator and friends. The glyphshow operator (as you've found) uses the 
> glyph name (or CID for CIDFonts), and isn't limited to the Encoding 
> array.
>
> However, it will *not* work with CIDFonts, CIDFonts don't have glyph 
> names, they have CIDs.
>
> So I feel you must be using regular fonts at present, not CIDFonts. I'm 
> amazed that GS doesn't generate an error for this, it should give you an 
> error on composefont if you try to compose a non-CIDFont with a CMap, or 
> on glyphshow if you try to use a glyph name as a CID.

a) Could one catch these errors, and choose one of two appropriate
   strategies based on this?  This way things would "work" with type
   1,2,3 fonts, AND with CIDs with known-in-advance CIDs.

b) Similarly, is there a way to "inspect CID" to distinguish a few
   most often used ones?

c) Likewise, could not one check directly whether the member in a
   dictionary is .nondef, and treat the corresponding Unicode char
   specially (via "fallback fonts", or emit Unicode-HEX)?

If so, one could get a flexible way to support everything but type 42
fonts, and obscure CID fonts...

Thanks,
Ilya
0
Ilya
1/26/2010 12:11:39 AM
In article <slrnhlscpr.uiv.nospam-abuse@powdermilk.math.berkeley.edu>, 
nospam-abuse@ilyaz.org says...

> > So I feel you must be using regular fonts at present, not CIDFonts. I'm 
> > amazed that GS doesn't generate an error for this, it should give you an 
> > error on composefont if you try to compose a non-CIDFont with a CMap, or 
> > on glyphshow if you try to use a glyph name as a CID.
> 
> a) Could one catch these errors, and choose one of two appropriate
>    strategies based on this?  This way things would "work" with type
>    1,2,3 fonts, AND with CIDs with known-in-advance CIDs.

You could, but in order to use a CIDFont you have to do some special 
stuff (use findresource on the /CIDFont resource followed by somposefont 
with a CMap) or, if your RIP supports it, use findfont on a name which 
is a combination of a named CIDFont and a named CMap. For example 
'/Ryumin-Light-83pv-RKSJ-H findfont' would look for a CIDFont called 
Ryumin-Light and compose it with the 83pv-RKSJ-H CMap automatically.

Either way you pretty much have to know that you are dealing with a 
CIDFont.

 
> b) Similarly, is there a way to "inspect CID" to distinguish a few
>    most often used ones?

Not entirely sure what you mean. There are no glyph names in a CIDFont, 
so you can't inspect those. The CIDs are arbitrary and may be different 
for different font vendors, so the existence of CID 1 does not tell you 
that the /Eth glyph is present. for example. 

The Registry and Ordering information in the CIDSystemInfo will tell you 
that indirectly, but there are no lists of which glyphs are contained in 
which combinartions that I'm aware of (to be fair pretty much everyone 
follows the Adobe Registry and language Ordering, but its not totally 
universal)

 
> c) Likewise, could not one check directly whether the member in a
>    dictionary is .nondef, and treat the corresponding Unicode char
>    specially (via "fallback fonts", or emit Unicode-HEX)?

You can't tell what CID is being used by the internal font machinery, 
not even if its CID 0 (the CID equivalent of /.notdef). You could load 
the CMap and interpret that, but that would seem tedious.

Besides it wouldn't tell you anything about the font, just the CMap. 
glyphshow doesn't tell you if you try to use a CID that isn't in the 
font, it just silently substitutes the CID 0 glyph. The same way that 
hsow doesn't tell you if you try to use an unmapped Encoding position.

 
> If so, one could get a flexible way to support everything but type 42
> fonts, and obscure CID fonts...

Type 42 is easier, since the TT tables in the sfnts array will tell you 
pretty much what you need. Also, being as these are always converted 
fonts, they don't generally contain any glyphs which are not encoded in 
the Encoding array.

CIDFonts with type42 outlines are harder.

Note that all of this only applies when dealing with fonts already 
loaded on the interpreter, and called from the PostScript program. Its 
much more normal to *include* the fonts you want to use in the 
PostScritp program.

This is because you can't generally rely on a different RIP having the 
same fonts (especially true when dealing with CIDFonts). In fact the 
only fonts you cna rely on are the old level 1 'base 35' fonts. These 
are all old-fashioned type 1 fonts with no support for more interesting 
languages.

Naturally, if you include the font, you know what format its in, so the 
whole discussion is rather moot.


			Ken
0
ken
1/26/2010 8:26:01 AM
On 2010-01-26, ken <ken@spamcop.net> wrote:
> with a CMap) or, if your RIP supports it, use findfont on a name which 
> is a combination of a named CIDFont and a named CMap. For example 
> '/Ryumin-Light-83pv-RKSJ-H findfont' would look for a CIDFont called 
> Ryumin-Light and compose it with the 83pv-RKSJ-H CMap automatically.
>
> Either way you pretty much have to know that you are dealing with a 
> CIDFont.

> Naturally, if you include the font, you know what format its in, so the 
> whole discussion is rather moot.

Sorry, but I do not see anything "natural" about these assumptions...

Your arguments assume that there is some "mythical you" whom you
address.  More often than not, postscript is produced by some program
which is supplied some parameters; for example, it may be supplied
font names.  There may be also an intermediate level of encapsulation,
when some of these parameters are calculated by a script.

Now there are 3 "you"s in the pipeline: whoever wrote the program,
whoever wrote the script, and whoever provided parameters to the script.
I may have a broken model of the world in my head, but I can't see how
ANYBODY of these 3 would know something about a type of the used font...

Yours,
Ilya
0
Ilya
1/26/2010 9:23:48 AM
In article <slrnhltd54.c18.nospam-abuse@powdermilk.math.berkeley.edu>,
Ilya Zakharevich  <nospam-abuse@ilyaz.org> wrote:
>On 2010-01-26, ken <ken@spamcop.net> wrote:
>> with a CMap) or, if your RIP supports it, use findfont on a name which 
>> is a combination of a named CIDFont and a named CMap. For example 
>> '/Ryumin-Light-83pv-RKSJ-H findfont' would look for a CIDFont called 
>> Ryumin-Light and compose it with the 83pv-RKSJ-H CMap automatically.
>>
>> Either way you pretty much have to know that you are dealing with a 
>> CIDFont.
>
>> Naturally, if you include the font, you know what format its in, so the 
>> whole discussion is rather moot.
>
>Sorry, but I do not see anything "natural" about these assumptions...
>
>Your arguments assume that there is some "mythical you" whom you
>address.  More often than not, postscript is produced by some program
>which is supplied some parameters; for example, it may be supplied
>font names.  There may be also an intermediate level of encapsulation,
>when some of these parameters are calculated by a script.
>
>Now there are 3 "you"s in the pipeline: whoever wrote the program,
>whoever wrote the script, and whoever provided parameters to the script.
>I may have a broken model of the world in my head, but I can't see how
>ANYBODY of these 3 would know something about a type of the used font...

You do have a broken model of the world.   <wry grin>

For 'simple' cases, the details of font construction don't matter.  And
none of the parties you name have to know anything about the construction
of the font.  

For 'complex' cases, the situation is radically different.   The programmer
who wrote the application must make it emit _different_ Postscript code
depending on the type of the font that is being used.  To do that, the app.
must be able to -recognize- the different types of fonts in the first place.
either by looking at the font file, and deciphering enough to figure out what
the type is, or because it is explicitly told the answer via some parameter.

For a CIDfont, one must, as a practical matter, access it -through- a CMap
To get the 'desired' glyph out of a CIDfont, one must 'know' the index
of that glyph in the CMap being used.  It is -not- practical to 'invert' the 
CMap into a set of 'portable' glyph names; thus one *must* 'know' the ordinal 
in _that_ CMap for the desired glyph -- one _can_ have multiple CMaps for 
the same CIDfont that map the -same- glyph to different ordinals in the CMap;
knowing that the desired glyph is ordinal 8 in CMap #1 does you _no_ good if
one is presently using CMap #2.  Thus, to 'know' which ordinal to specify to
get a particular glyph, one *must* 'know something' (the CMap encoding _and_
the glyph location in the CIDfont) about the specific font (and CMap) being 
used.

Whether or not you think that that should, or should -not-, be 'necessary', is
"immaterial and irrelevant" -- that _is_ the way the world "is", and you have 
no choice but to live with it.

Dealing with CIDfonts is *messy*.  That is precisely _why_ the other font 'types'
exist -- to make it *easier* to deal with 'typical' subsets of the set of
'all possible glyphs'.  If you cannot use one of the forms with the 'simplifying
assumptions' "built in", you have no choice but to deal with the 'full messiness'
of the 'unconstrained methodology'.  In order to do _that_, you do have to know
a whole lot about the font(s) you're working with.  the 'automatic housekeeping'
of the simplified forms is no longer available to you.



0
bonomi
1/26/2010 10:46:45 AM
In article <slrnhltd54.c18.nospam-abuse@powdermilk.math.berkeley.edu>, 
nospam-abuse@ilyaz.org says...
> On 2010-01-26, ken <ken@spamcop.net> wrote:
> > with a CMap) or, if your RIP supports it, use findfont on a name which 
> > is a combination of a named CIDFont and a named CMap. For example 
> > '/Ryumin-Light-83pv-RKSJ-H findfont' would look for a CIDFont called 
> > Ryumin-Light and compose it with the 83pv-RKSJ-H CMap automatically.
> >
> > Either way you pretty much have to know that you are dealing with a 
> > CIDFont.
> 
> > Naturally, if you include the font, you know what format its in, so the 
> > whole discussion is rather moot.
> 
> Sorry, but I do not see anything "natural" about these assumptions...

Well, I guess I'm sorry too, but I do see this as natural.

 
> Your arguments assume that there is some "mythical you" whom you
> address.  More often than not, postscript is produced by some program
> which is supplied some parameters; for example, it may be supplied
> font names. 

In that case the 'you' is the application programmer (or the application 
itself if you prefer). In order to include a font in a PostScript 
program you *(the creator of the PostScript program) need to know what 
kind of font it is. The fonts as delivered on a computer, whether by 
download or by some other method, are not suitable for direct inclusion 
in a PostScript program.


> Now there are 3 "you"s in the pipeline: whoever wrote the program,
> whoever wrote the script, and whoever provided parameters to the script.

And in all 3 cases, if you include the font then the 'you' that included 
it must have known the font format.


> I may have a broken model of the world in my head, but I can't see how
> ANYBODY of these 3 would know something about a type of the used font...

If you incluide the font, then you *must* know what type it is, because 
you need to process the font in order to include it.

Of course you need not include a font, merely reference it. In this case 
you are opening yourself up for problems, because other PostScript 
consumers may not have the same font available. This is why most 
PostScript consumers (actually all, in practice) include the fonts they 
want to use. The only normal exceptions are the base 35 fonts and 
sometimes CJKV fonts, simply because of their size.


		Ken
0
ken
1/26/2010 2:47:49 PM
Reply:

Similar Artilces:

UTF-8 pinyin characters to PostScript
Hello, Does anyone know how I can convert a text file containing UTF-8 pinyin characters to PostScript so that I may print them. I ask because even when the terminal is configured to display the characters correctly by choosing some font, and I use lpr or some other command like a2ps to print to a postscript file, some of the pinyin characters (in particular those not falling within the Latin 1 ISO 8859-1 standard), are not rendered correctly (they appear as either garbage or as missing (lpr) or as a box with nothing in it (mozilla) depending on the application). Does anyone know how I...

pcnetsecurity@gmail.com =?UTF-8?B?QXNzaXN0w6puY2lhIFTDqWM=?= =?UTF-8?B?bmljYSAgbWFudXRlbsOnw6M=?= =?UTF-8?B?byBkZSBjb21wdXRhZG9y?= =?UTF-8?B?ZXMgaW5mb3JtYXRpY2Eg?= =?UTF-8?B?Vml0w7NyaWEtZXMgMDA2NTY=?=
Contato: pcnetsecurity@gmail.com Contato: pcnetsecurity @ gmail.com Planos a partir de R$ 250,00 . Assist�ncia T�cnica Prestamos assist�ncia t�cnica nos computadores de sua empresa ou resid�ncia, e tamb�m possu�mos uma equipe qualificada para fazer a manuten��o no pr�prio local. - Contratos de Suporte e Manuten��o Reduza os custos de sua empresa com solicita��es de visitas t�cnicas para seus computadores, elaboramos um contrato de manuten��o integrado para sua empresa onde disponibilizamos: t�cnicos, equipamentos de suporte e substitui��o, e atendimento no hor�rio comercial ou ...

UTF-8 characters randomly displayed in xterm
Hi all, I have a weird problem trying to display UTF-8 characters when using 'xterm 2.4.3' through Exceed (Hummingbird Connectivity 10) emulator working under Windows XP. The important thing with the Exceed emulator is the fonts are loaded from the Exceed Windows client directory and not on the Unix host. My running OS is Solaris 10 over x86. I got 'xterm 2.4.3' from the www.sunfreeware.com web site because the standard xterm delivered with Solaris was not displaying the UTF-8 fonts correctly. So after successfully installing the 'en_GB.UTF-8' locale, I was able t...

How do I display character 151 (long hyphen) in XHTML (utf-8) ?
How do I display character 151 (long hyphen) in XHTML (utf-8) ? Is there another character that will substitute? The W3C validation parser, http://validator.w3.org, tells me that this character and the ones around it are illegal - then, after resubmission it flags no errors. So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it just my imagination that the W3C validation parser thinks there are - say between 129-151, or thereabouts; then later it changes its mind? Zenobia <5.20.zenobia@spamgourmet.com> wrote: >How do I display cha...

xterm 2.4.3 fails to displays some of the UTF-8 characters
The important thing with the Exceed emulator is the fonts are loaded from the Exceed Windows client directory and not on the Unix host. My running OS is Solaris 10 over x86. I got 'xterm 2.4.3' from the www.sunfreeware.com web site because the standard xterm delivered with Solaris was not displaying the UTF-8 fonts correctly. So after successfully installing the 'en_GB.UTF-8' locale, I was able to display the extended characters (like Chinese and Arabic characters). Here is my terminal env: XTERM_LOCALE=en_GB.UTF-8 LANG=en_GB.UTF-8 HZ=100 XTERM_VERSION=XTerm(243) OPENWINH...

Why are international characters are passed to java in case of export LANG=en_US.UTF-8
Hi, from a shell script (RH-8.0) if export LANG=en_US.UTF-8 is set the parameters containing international characters are wrongly passed to java main method. Any ideas why? I use jdk1.4.2_06-b03 Zsolt ...

FA: monitor A1085S, Surfer-package, SuperJAM, =?UTF-8?B?QmFycydu?= =?UTF-8?B?w4RQaXBlcywgbW91c2UsIEJsdVJheSwgQ3lnbnVzRWQsIERlbW9tYWtlciwgT1M=?= =?UTF-8?B?IDMuMCBhbmQgbWFueSghKSBtb3Jl?=
X-No-Archive: Yes Hi! Got some time to sort a few things and put some of them on eBay. All starting from EUR 1,-, shipping is worldwide on all items. original Commodore monitor A1085S original Amiga Technologies Modem with accessoires (Surfer-package) ASUS USB BluRay-burner mouse "The Boing Mouse" SuperJAM sequencer Bars'n'Pipes sequencer Cygnus Ed Professional complete OS 3.0 (manuals with disks) AmiAtlas 2.0 with maps Turbo Print Professional Data Becker Demomaker Data Becker VectorObject Editor Data Becker FontEd MousePad B...

Programme and abstracts of CFP: PT-AI 2013
The program and all topics with abstracts now here for reading: http://www.pt-ai.org/node/280 Especially "Rationality and Intelligence" of about 30 PowerPoint pages: http://www.pt-ai.org/sites/default/files/ptai2013/presentations/Stuart-Russel.ppt Burkart ...

: =?utf-8?Q?Re:=20BASIC=20-=20PRINT=20?==?utf-8?Q?USING?= =?utf-8?Q?=20a=20quad=20value?= #2
Nope ... there are 25 spaces in front of the 0 when displayed on the terminal. ...

UTF-8 to PostScript
Hello, I have some files containing some relatively unusual UTF-8 characters. I would like to be able to print these. I have tried a2ps but it does not support UTF-8. And the closest I got was viewing the file with mozilla and printing it. Almost worked, except for it outputed several of the pinyin characters I am trying to print as squares! Basically some of the characters are accented with some special diacritics (such as a diaeresis with an acute on top of it). I don't know of any application which would allow me to print these. Basically I am assuming that once I can see them with...

UTF-8 Character Encodings and "NO-BREAK SPACE" (dec: 202, hex: CA) Character
Hey all, I have a bizzare problem with a piece of mail (most likely sent by Outlook) that is in UTF-8 format. There is a character, coming after spaces, which from looking at a hexdump of the file, seems to be a CA (decimal: 202). From most UTF-8 documentation I can find, this is an accent circumflex. In browsers (IE, FF, Safari), this character shows up as an unknown character, or as the accent circumflex. In a mail browser, however (Outlook, Apple Mail), the character appears as a "NO-BREAK WHITESPACE" (just a space visually), or the equivelent of an "&nbsp;". So...

UTF-8 garbage characters
I'd love to ask why this page is not rendering correctly in Safari on a Macintosh but I suspect someone will tell me to validate the page first. Nevertheless, if anyone sees an obvious reason that I'm missing, I'd like to know. It looks like a missing div tag but I can't see one. http://www.krubner.com/ Let's move on to a question that might be answerable. If i copy and paste non-UTF-8 characters to the page, and then send out a UTF-8 charset UTF-8 header then I'll get the garbage characters that I'm seeing? *lawrence* skrev 2004-10-01 09:34: &g...

=?UTF-8?Q?London_Stock_Exchange_in_merger_talks_with_Deutsche_B?= =?UTF-8?B?w7Zyc2U=?=
http://www.bbc.co.uk/news/business-35639157 Given that the Deutsche Börse is a big VMS user, I wonder what the combined setup will use. Phillip ? Any rumours ? In article <dj37cnFgscaU1@mid.individual.net>, Roy Omond <roy@omond.net> writes: > http://www.bbc.co.uk/news/business-35639157 > > Given that the Deutsche Börse is a big VMS user, I wonder what the > combined setup will use. > > Phillip ? Any rumours ? Rumours are all over the press (so much so that I can guess what the URL above refers to without checking it). We noticed it because, at lunch, we saw the price of DB stock jump. (That of the LSE jumped even more. Usually, the price of the buyer drops when there are such rumours. There is an official statement by now. It's being billed as a merger of equals, but according to market capitalization DB is somewhat larger and if it goes through DB stock owners will own the majority of the stock after the swap. The idea is to have the board 50/50 immediately after the merger. But it is still early. Since I've been there, DB has bought (and sold) some smaller companies, and there have been various cooperations, but none of the merger ideas (NYSE, Euronext, LSE (10 years ago)) happened. NYSE probably would have but it was forbidden. DB is still a big VMS user, although the newer systems are mostly linux. Most of the old ones (not just VMS) are still there. There is ...

UTF-8 character downcase!!
Hello, Who can help me with problem? I have a word = "ПРИВЕТ", it's in russian, and i want to downcase this word(=привет). But standart method downcase not works with non-english letters Thank you for reply -- Posted via http://www.ruby-forum.com/. From: list-bounce@example.com [mailto:list-bounce@example.com] On Behalf = Of Igor K. Sent: Saturday, September 01, 2007 1:24 PM > >Hello, >Who can help me with problem? > >I have a word =3D "=F0=F2=E9=F7=E5=F4", it's in russian, and i want to = downcase this >word(=3D=D0=D2=C9=D7=C5=D4). But stan...

UTF-8 characters in doctest
Hello, I have problems with running doctests if I use czech national characters in UTF-8 encoding. I have Python script, which begin with encoding definition: # -*- coding: utf-8 -*- I have this function with doctest: def get_inventary_number(block): """ >>> t =3D u'''28. =C8esk=E9 kr=E1lovsk=E9 insignie ... m=ECdirytina, grafika je zcela vy=F8ez=E1na z pap=EDru - max. rozm=ECr ... 420=D7582 nezna=E8eno ... text: opis v lev=E9m medailonu: CAROL VI IMP.ELIS.CHR. AVG. P=2EP.''' >>> get_invent...

utf-8 BOM characters
I used to be able use Emacs to edit an MS Windows utf-8 file in order to remove the byte-order-mark that Windows' editors often annoyingly add (three bytes at the start of the file: EF,BB, BF I think). When I try it just now, Emacs (23.3.1 on Linux, 23.2.1 on Windows) no longer shows me those bytes and hence I can't delete the blasted things. How can I tell Emacs to show me what's in the file and stop acting like bloody Notepad? (Tried hexl-mode but no way to delete characters(?) and way overkill -- I just want to see three blots at the start of the text that I can delete.) ...

[Info-ingres] =?utf-8?q?=53=50=41=4D=3A=20?= =?utf-8?Q?about=20power=20inverter=20=E5=85=B3=E4=BA=8E=E9=80=86=E5=8F?= =?utf-8?Q?=98=E5=99=A8=E4=BA=A7=E5=93=81=E5=90=88=E4=BD=9C?=
���ã� ���ã�������רҵ����������/����/ר�ã��������Ʒ75W-6500W�������Ҳ���300W/600W/1000W/1500W�����Ҳ���Ʒϵ�У������Ҳ�������������Դ���������ܳ�ŵ磩300W/500W/800W/1KW/2KW/3KW/4KW/5KW/6KW/10KWϵ�в�Ʒ..ͬʱ��רҵ���������ֻ�����ϵ�в�Ʒ����ͨ�á����ߡ������ߡ����������Ӿ����Ἴ���ij���Ӧ���뿪����������������MP3��������Ʒ�漰��Ƶ��USB��USB����Ƶ���Ϲ��ܵ����¼����IJ�Ʒ���������������Ҫ������ϵ���� ������������������ �� �Ϻ�ɣ����ӿƼ����޹�˾ sangshen@sangshen.net 2005��06��01�� ...

UTF-8 and Latin-1 characters
Since I am Swedish, I write website content mostly in Swedish language and using charset iso-8859-1. I have (just for testing) tried to use utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html ) but the special Swedish characters don't come out right if I dont use entities for them. The Swedish characters in question is: Latin letter a with ring above = &aring; (�) Latin letter a with diaeresis = &auml; (�) Latin letter o with diaeresis = &ouml; (�) I realize I can use the entities, but I found a page with Swedish content ( http://w1.318.comhem.se/~...

UTF-8 to named character entities
Hi all, I need a conversion from UTF-8 to named character entities (� -> &euml;) and after using the file for publishing purposes I need to convert it back to UTF-8. I tried this with HTML:Entities but i get very strange results. I know very little about Perl and its available modules. I just got perl to work on my Mac. Below is a piece of my XML-file and the converted result. Can anybody please give me an advise on how to proceed. Thanks, Chris -- XML -- <a>(�). De andere soort is die welke de contacten en transacties tussen pati�nten reguleert.�<vn> <al> <a>E...

UTF-8 garbage characters #2
Pierre Goiffon Oct 6 2004, 4:29 am show options Newsgroups: comp.infosystems.www.authoring.html >> The problem with charset UTF-8 on pages with forms for e.g. >> guestbooks, formmail and bloggs is that writing in a non-english >> language can give garbage characters from the letters that is not> >> represented in the english language. That's because what is writed in >> the text box don't get encoded, as text done with HTML editors does. > >I really can't understand your post. A server that sends a form to a client >with th...

utf-8 non-latin characters
Hello, I configured MySQL database in Fedora Linux. In 'test' database I created 'books' table, setting Character Set = UTF-8 Unicode and Collation = utf8_general_ci. When I enetered into varchar column 'title' Polish text "Hodowla żółwi" I received: Incorrect string value: "\xC5\xBC\xC3\xB3\xC5\x82..." for column 'title' at row 1. When I use query 'SELECT title FROM books' I see "Hodowla ?ó?wi" instead of "Hodowla żółwi". So I have a question: How can I use utf-8 non-latin characters? /RAM/ On Tue...

UTF-8 character order bug?
So in keeping with the "future-thinking" way of doing things, I thought I'd switch my locale to UTF-8. Sadly, it breaks a number of things. I can live (unhappily) with having my upper and lower case filenames interspersed, but that ordering causes some interesting problems. Specifically: Letters are ordered lower case, then upper case, for each letter in turn. That is, the ordering sequence is aAbBcC...zZ. Now consider what this means in doing an ls: #ls [A-Z]* One would intuitively expect this to give all files starting with a capital letter, which it does in the C (= POSIX...

display issues with ja_JP.UTF-8
I am have a display problem with mutt. When I have special characters, for I am have a display problem with mutt. When I have special characters, for example Japanese characters or the arrows which indicate a thread, mutt shifts creates empty line in the index and shifts the rest of the line over one position. The shifted over portion wraps onto the next line, thereby obstructing the display. I read of a similar problem in the Debian bug lists, but they concluded that it was not a bug. I have upgraded mutt to the newest version available for Gentoo. I have LC_TYPE=ja_JP.UTF-8 and ...

utf-8 issue on netbsd (do netbsd 2.1.0 support UTF-8?)
Hello. I am not sure if this is a netbsd issue or its software package problem. All my emails are encoded in UTF-8. I have a shell account on a NETBSD 2.1.0_STABLE server. All email software function strangely (junk text) and UTF-8 seems not accepted. Two screenshots taken from a Linux box remotely accessing the netBSD server: Using pine read email (LANG=en_US.UTF-8, all emails are in charset UTF-8) gopher://sdf.lonestar.org/I/users/weiwu/netbsd_utf8_issue_pine.png Using mutt read email (LANG=en_US.UTF-8, all emails are in charset UTF-8) gopher://sdf.lonestar.org/I/users/weiwu/netbsd_utf8_i...

Web resources about - how to display utf-8 characters - comp.lang.postscript

List of Killzone characters - Wikipedia, the free encyclopedia
Cpl. Dante Garza is a loyal, optimistic and likable character- an effective team player. A close companion of Sev, the pair have served together ...

Albany resident Marco McClintock, 87, namesake of Dr. Seuss character
THIS tale is so superbly sensational (and surprising) it could only be related to Dr Seuss.

How Tom Cruise's bizarre 'Tropic Thunder' character was created — and why we may see him again
... care that the lead actor (Stiller) in his multimillion-dollar movie has been kidnapped in the jungles of Vietnam. The reason why the character ...

Andrew Lincoln on Walking Dead character’s relationship: ‘it feels right’
... third season. Showrunner Scott Gimple told EW that “ The Andrea story from the comics has been broken up and given to several different characters ...

5 Flash Characters Who May Be Behind The Mask
Zoom isn’t the only person on The Flash anymore whose identity is a mystery. Now there’s that prisoner with the iron mask over his face. Here ...

Elon Musk Now Inspiring TV Show Villain Characters
... the pop consciousness of the US, it now appears that he is being used as the inspiration for TV show villains. Why is it obvious that the character ...

Deadpool 2: Writers on Cable, Budget, New X-Men Characters - Collider
With Deadpool's smashing success, the film's writers talk about their plans for Deadpool 2, why the sequel doesn't need a huge budget, Cable, ...

The Walking Dead To Introduce Familiar Character, Include Big Twist
Still recovering from the midseason premiere of The Walking Dead? Get yourself together fast, [...]

George W. Bush: Jeb’s Withdrawal From Race ‘Reflects His Selfless Character And Patriotism’
George W. Bush: Jeb’s Withdrawal From Race ‘Reflects His Selfless Character And Patriotism’

‘Mad Max’ Costume Designer Jenny Beavan On Working With Namibian Locals & Off-The-Wall Characters – AwardsLine ...
Jenny Beavan seemingly has done it all in her lengthy career, costuming musicals, period dramas, plays and winning an Oscar for A Room With a ...

Resources last updated: 2/24/2016 12:52:14 PM