f



Text file encodings in OS-X (ISO Latin1 8859 vs UTF-8)

Out of curiosity, when I save a text file in TextEdit, I am given the
chance to specify the text encoding (ISO 8859-1 Latin1, UTF-8 and amny
others) in the "save as" menu option.

How/where is this stored ? From the command line, is there a way to see
and possibly change the text encoding associated with a file ?

Recently used PHP to download data from the CRTC web site which the HTTP
headers specified as UTF-8 but PHP has great problems dealing with
accented characters both when data read directly via HTTP or if the HTML
files were first stored locally as text files.

And now, I have TextEdit telling me it can no longer save a text
document because I pasted text that probably contains characters not
possible in latin-1 so I have to save-as UTF-8.

I'd like to have a better understanding on how text files are processed
under OS-X.
0
12/18/2012 8:43:25 PM
comp.sys.mac.system 33446 articles. 2 followers. jfmezei.spamnot (9469) is leader. Post Follow

26 Replies
1139 Views

Similar Articles

[PageSpeed] 13

On 2012-12-18 20:43:25 +0000, JF Mezei <jfmezei.spamnot@vaxination.ca> said:

> Out of curiosity, when I save a text file in TextEdit, I am given the
> chance to specify the text encoding (ISO 8859-1 Latin1, UTF-8 and amny
> others) in the "save as" menu option.
> 
> How/where is this stored ? From the command line, is there a way to see
> and possibly change the text encoding associated with a file ?
> 
> Recently used PHP to download data from the CRTC web site which the HTTP
> headers specified as UTF-8 but PHP has great problems dealing with
> accented characters both when data read directly via HTTP or if the HTML
> files were first stored locally as text files.
> 
> And now, I have TextEdit telling me it can no longer save a text
> document because I pasted text that probably contains characters not
> possible in latin-1 so I have to save-as UTF-8.
> 
> I'd like to have a better understanding on how text files are processed
> under OS-X.


As far as I have been able to tell, encoding determination in Mac OS X is
just like everywhere else.  In other words, the protocol for the file type
specifies (indirectly in some cases like XML) the encoding.

The determination of the encoding for a file can be a little tough.  For
the most part, the encoding is simply implied.



Good luck,

-- 
John Holt

0
me10 (19)
12/18/2012 9:58:00 PM
On 12-18-2012 16:58, John Holt wrote:
> As far as I have been able to tell, encoding determination in Mac OS X is
> just like everywhere else.  In other words, the protocol for the file type
> specifies (indirectly in some cases like XML) the encoding.

I could be wrong, but I believe some programs use a resource fork and 
many (including TextEdit) use a system-call version of the file command.

man file

to see how that works.

-- 
Wes Groleau

    I've noticed lately that the paranoid fear of computers becoming
    intelligent and taking over the world has almost entirely disappeared
    from the common culture.  Near as I can tell, this coincides with
    the release of MS-DOS.
                                  — Larry DeLuca

0
news31 (6772)
12/19/2012 3:57:47 AM
In message <50d0d56e$0$34979$c3e8da3$460562f1@news.astraweb.com> 
  JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
> Recently used PHP to download data from the CRTC web site which the HTTP
> headers specified as UTF-8 but PHP has great problems dealing with
> accented characters both when data read directly via HTTP or if the HTML
> files were first stored locally as text files.

It does?

How odd, I use php all the time and haven't noticed a problem with UTF8
files. At least not in 2 or 3 years.

> And now, I have TextEdit telling me it can no longer save a text
> document because I pasted text that probably contains characters not
> possible in latin-1 so I have to save-as UTF-8.

UTF8 is the most common file type on the Internet. You'd best make sure
your web browser and php are setup properly.

> I'd like to have a better understanding on how text files are processed
> under OS-X.

OS X supports many encodings. Hundreds, at least. The best choice in all
cases for all files is UTF8 unless you are dealing with people on
Windows 98 or something.

-- 
*** AgentSmith sets mode: +m
0
g.kreme (3671)
12/19/2012 4:35:00 AM
On 12-12-18 23:35, Lewis wrote:

> UTF8 is the most common file type on the Internet.

But for text files, especially OS config files and bash scripts, I am
not sure if UTF-8 is the most common.



> You'd best make sure
> your web browser and php are setup properly.

When browsing those pages from CRTC, the accents display correctly. But
when using PHP to extract those pages and save the data to a text file
or create file name with those accented characters, it was screwed up.

For instance I would take  "Alain Gagn�"  as a name and convert it to
0057_Alain_Gagn�.PDF" where this guy's submission was to be stored, but
the file name wouldn't have the � but some multi character combo.

Just wondering if running PHP in a xterm window causes it to go into
some mode which won't support UTF8. The CRTC web site does provide UTF-8
as the Character set in the HTTP response header.

0
12/19/2012 7:59:32 AM
JF Mezei <jfmezei.spamnot@vaxination.ca> writes:

Your article, by the way, was posted thusly -

> Content-Type: text/plain; charset=ISO-8859-1

But I don't know from whence it was created (the
computer with the problem, or somewhere else).

> Just wondering if running PHP in a xterm window causes it to go into
> some mode which won't support UTF8. The CRTC web site does provide UTF-8
> as the Character set in the HTTP response header.

You might want to start by defining LANG = en_CA.UTF-8 or maybe fr_CA.UTF-8
on your local machine.  locale -a will give you a list of available choices.
xterm may have something to set as well, I don't use it so I don't know.

Billy Y..
-- 
        sub     #'9+1   ,r0             ; convert ascii byte
	add     #9.+1   ,r0             ; to an integer
	bcc     20$                     ; not a number
0
billy22 (785)
12/19/2012 8:48:27 AM
In article <karv0r$pf8$1@reader1.panix.com>, billy@MIX.COM wrote:

> JF Mezei <jfmezei.spamnot@vaxination.ca> writes:
computer with the problem, or somewhere else).
> 
> > Just wondering if running PHP in a xterm window causes it to go into
> > some mode which won't support UTF8. The CRTC web site does provide UTF-8
> > as the Character set in the HTTP response header.

FWIW I had a heck of a problem getting accented characters correct in 
the Tiger version of Terminal so xterm might be the problem.

> You might want to start by defining LANG = en_CA.UTF-8 or maybe fr_CA.UTF-8
> on your local machine.  locale -a will give you a list of available choices.
> xterm may have something to set as well, I don't use it so I don't know.

You can do that on the invoking line with something like:

LANG=fr_CA.UTF-8 php <mumble>

For example the following gives me French month abbreviations in the 
dates (in Terminal), including the e-acute in "d�c":

LANG=fr_CA.UTF-8 ls -l

-- 
Paul Sture

Q: pleasecanyoufixmyspacebar?
A: myspaceisdeadyouneedtotryfacebook
0
nospam9740 (2260)
12/19/2012 9:33:36 AM
In article <50d173e6$0$64387$c3e8da3$33881b6a@news.astraweb.com>,
 JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 12-12-18 23:35, Lewis wrote:
> 
> > UTF8 is the most common file type on the Internet.
> 
> But for text files, especially OS config files and bash scripts, I am
> not sure if UTF-8 is the most common.

It's the only one it makes any sense to use.

Why don't you enquire on comp.lang.php ?

-- 
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted"  --  Bill of Rights 1689
0
timstreater2 (1190)
12/19/2012 9:59:36 AM
In article <nospam-B794AC.10333619122012@news.chingola.ch>,
 Paul Sture <nospam@sture.ch> wrote:

> In article <karv0r$pf8$1@reader1.panix.com>, billy@MIX.COM wrote:
> 
> > JF Mezei <jfmezei.spamnot@vaxination.ca> writes:
> computer with the problem, or somewhere else).
> > 
> > > Just wondering if running PHP in a xterm window causes it to go into
> > > some mode which won't support UTF8. The CRTC web site does provide UTF-8
> > > as the Character set in the HTTP response header.
> 
> FWIW I had a heck of a problem getting accented characters correct in 
> the Tiger version of Terminal so xterm might be the problem.
> 
> > You might want to start by defining LANG = en_CA.UTF-8 or maybe fr_CA.UTF-8
> > on your local machine.  locale -a will give you a list of available choices.
> > xterm may have something to set as well, I don't use it so I don't know.
> 
> You can do that on the invoking line with something like:
> 
> LANG=fr_CA.UTF-8 php <mumble>
> 
> For example the following gives me French month abbreviations in the 
> dates (in Terminal), including the e-acute in "d�c":
> 
> LANG=fr_CA.UTF-8 ls -l

And yes, in xterm too. 

But please note I am using Xquartz here because I am on Mountain Lion.

-- 
Paul Sture

Q: pleasecanyoufixmyspacebar?
A: myspaceisdeadyouneedtotryfacebook
0
nospam9740 (2260)
12/19/2012 4:43:43 PM
Here is another example:

Saved:
> https://services.crtc.gc.ca/pub/ListeInterventionList/Default-Defaut.aspx?en=2012-557&dt=r&lang=e

as an .html file on disk. Firefox displays Rachel Laperri�re properly.


Both Textedit and the xwindows nedit display the html line as:

<span id="ctl00_ContentMain_gvData_ctl04_lblIntervenor">Laperrière,
Rachel</span><br />


And when PHP fetches the data with a

$level1html =
file_get_contents('https://services.crtc.gc.ca/pub/ListeInterventionList/Default-Defaut.aspx?en=2012-557&dt=c&Lang=e');


The data in "level1html" variable is also as shown in the span. And when
it uses that data to create a filename, the resulting file on OS-Xs
Finder is the "corrupt" one, not one with Laperri�re

I tried the setlocale in PHP to no avail.

And I tried to do the iconv to convert to it plain ASCII with
transliteration so it would end up Rachel Laperriere  and that also failed.

If the text encoding were auto detected based on content of file, how
come Textedit doesn't detect UTF-8 data ?

Now it gets stranger:
I do a "view source" from Firefox.  Rachel's name displayed properly.
Select all, copy, and paste it into an empty textedit window. Rachel's
name still fine.  Save the file as a text file:

Now, textedit can reopen the file and see Rachel's name fine, but nedit
will display the raw characters. (which probably means nedit just not
capable of supporting UTF-8 characterd).

However, how come if I save the HTML to a file from firefox, textedit
fails, but if I view source , copy paste into text edit, save it , and
then reopen it, the file is fine ?

Seems to me there must be some hidden file attriute somewhere which
doesn't get set when firefox saves the source file, but does when
textedit saves it.



0
12/19/2012 6:24:05 PM
BTW, here is how PHP displays my entry in an xterm:

name=Mezei, Jean-François
name3=Mezei_Jean-François
company=Vaxination Informatique
docs=Documents.aspx?ID=176417&Lang=e
number=0020

Here is how the file comes out in Finder after PHP has used my name to
construct a file containing the document:

0020_Mezei_Jean-François_01.PDF

I guess I'll have to find a way to run terminal.app on the xserve
(headless server) to see if it PHP would then run in UTF-8 mode.


0
12/19/2012 6:35:53 PM
Just tried running PHP in Terminal.App on the Xserve and it yielded the
same results.

In the "Advanced" preferences for Terminal.App the character encoding is
set to UTF-8 with the "set locale on startup" checked.


0
12/19/2012 6:40:47 PM
With Finder, I manually corrected the accented characters (at this stage
of the process, the number of documents is smaller)

In "Terminal.app" the folder for Vid�otron comes out as:

Qu�becor_M�dia_inc


But in Xterm:

Que??becor_Me??dia_inc

So it is probably some resource I need to set in Xterm to tell it to use
UTF-8 by default since in its currnet state it is unable to handle it.
Interesting that Xtrerm displays the � as   ��  when dealing with data
in a file, but when I manually set a file name with Finder, Xterm then
displays the � as ??




0
12/19/2012 6:50:39 PM
JF Mezei <jfmezei.spamnot@vaxination.ca> writes:
> Out of curiosity, when I save a text file in TextEdit, I am given the
> chance to specify the text encoding (ISO 8859-1 Latin1, UTF-8 and amny
> others) in the "save as" menu option.
>
> How/where is this stored ?  From the command line, is there a way to
> see and possibly change the text encoding associated with a file ?

TextEdit stores the encoding in an extended attribute, which you can
retrieve with xattr.

  $ hexdump -C utf8.txt
  00000000  66 69 6c 65 20 77 69 74  68 20 61 20 c2 a3 20 73  |file with a .. s|
  00000010  69 67 6e 0a                                       |ign.|
  00000014
  $ hexdump -C wlatin1.txt
  00000000  66 69 6c 65 20 77 69 74  68 20 61 20 a3 20 73 69  |file with a . si|
  00000010  67 6e 0a                                          |gn.|
  00000013
  $ xattr -l utf8.txt
  com.apple.TextEncoding: utf-8;134217984
  $ xattr -l wlatin1.txt
  com.apple.TextEncoding: windows-1252;1280

(The decimal value is an OSX-specific identifier for the encoding.)

But this is not a general answer; many other programs will neither set
the attribute nor pay any attention to it.  For example, most Unix
programs will either ignore the question entirely or assume that
encoding implied by the LC_CTYPE locale setting holds for all files.

-- 
http://www.greenend.org.uk/rjk/
0
rjk (534)
12/19/2012 7:50:59 PM
In message <50d173e6$0$64387$c3e8da3$33881b6a@news.astraweb.com> 
  JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
> On 12-12-18 23:35, Lewis wrote:

>> UTF8 is the most common file type on the Internet.

> But for text files, especially OS config files and bash scripts, I am
> not sure if UTF-8 is the most common.

>> You'd best make sure
>> your web browser and php are setup properly.

> When browsing those pages from CRTC, the accents display correctly. But
> when using PHP to extract those pages and save the data to a text file
> or create file name with those accented characters, it was screwed up.

Then you did it wrong.

> For instance I would take  "Alain Gagné"  as a name and convert it to
> 0057_Alain_Gagné.PDF" where this guy's submission was to be stored, but
> the file name wouldn't have the é but some multi character combo.

Then you did it wrong. Or you did it on a MSFT Windows Server (which I
would classify as 'wrong') since MSFT uses UTF-16, just to be total
dickheads.

An easy hack is to urlencode() and urldecode() the file name
before/after reading it.

> Just wondering if running PHP in a xterm window causes it to go into
> some mode which won't support UTF8. The CRTC web site does provide UTF-8
> as the Character set in the HTTP response header.

I don't do X11/xterm. It's always seemed a very limited and kludgy
solution.

-- 
Nihil est--in vita priore ego imperator Romanus fui.
0
g.kreme (3671)
12/19/2012 10:20:24 PM
Here's a handy web test page I stumbled upon today -

UTF-8 Sampler
http://www.columbia.edu/~fdc/utf8/

And to fill some of what Apple doesn't provide (the
author's web site is not responding for me now) -

Code2000 Font
http://www.fonts2u.com/code2000.font

Not that any of this solves the OP's problem, but, hey...

Billy Y..
-- 
        sub     #'9+1   ,r0             ; convert ascii byte
	add     #9.+1   ,r0             ; to an integer
	bcc     20$                     ; not a number
0
billy22 (785)
12/20/2012 1:35:52 AM
On 12-19-2012 03:48, billy@MIX.COM wrote:
> You might want to start by defining LANG = en_CA.UTF-8 or maybe fr_CA.UTF-8
> on your local machine.  locale -a will give you a list of available choices.
> xterm may have something to set as well, I don't use it so I don't know.


I have all of my locales set to en-US.UTF-8 and Terminal's default also 
set to UTF-8  Most of the time it works pretty well.  There are programs 
that feel obligated to use octal for ALL non-ASCII characters.  And 
there is an occasional odd inconsistency, such as:

iMac:~ wgroleau$ touch X=㌳䑄啕∢晦睷袈香ꪪ==X
iMac:~ wgroleau$ ls X*
X=㌳䑄啕∢晦睷袈香?==X
iMac:~ wgroleau$ ls -lat | head -6
total 4133344
drwxr-x---   732 wgroleau  staff 24888 Dec 19 22:39 .
-rw-r--r--     1 wgroleau  staff    72 Dec 19 22:39 .signature
drwx------    15 wgroleau  staff   510 Dec 19 22:37 .dropbox
-rw-r--r--     1 wgroleau  staff     0 Dec 19 22:37 X=㌳䑄啕∢晦睷袈香ꪪ==X
drwx------     2 wgroleau  staff    68 Dec 19 22:07 .Trash

If you don't see at least one Chinese character there, YOUR newsreader 
doesn't honor UTF-8 encoding headers.

There was only ONE equal sign at each end.  What was before it was 
U+AAAA which is apparently not available on my Mac (you can see what it 
looks like on <http://www.unicode.org/charts/PDF/UAA80.pdf>.

Before that was U+9999 or 香

In Terminal, the 'ls -lat' did not have the extra equal sign, and for 
the low vo, it had the glyph meaning HUH?!?

I find it interesting that 'ls' _ass_umed_ that Terminal couldn't handle 
it and replaced it with ?= but when piped, left it alone and 'head' 
displayed it the same as the shell (and pasting it into Thunderbird 
changed the single glyph into a different one and an equal sign!)

-- 
Wes Groleau

    “Statistics are like bikinis.
     What they reveal is suggestive,
     but what they conceal is vital.”
                       — Aaron Levenstein
0
news31 (6772)
12/20/2012 4:00:05 AM
On 12-19-2012 13:24, JF Mezei wrote:
> Both Textedit and the xwindows nedit display the html line as:
>
> <span id="ctl00_ContentMain_gvData_ctl04_lblIntervenor">Laperrière,
> Rachel</span><br />

When you don't tell TextEdit what encoding to use, it uses substantially 
the same methods as described in 'man file'
(Or the O.S. does and tells it the result)

In this case, the è was farther into the file than the characters used 
to guess, i.e., the guessing code only saw ASCII characters.  In that 
case there is no guess.  You must have TextEdit's default as ISOLatin1

è is from decoding the UTF-8 bytes of è by Latin 1 rules.

I set my default to UTF-8.  So for me TextEdit usually handles gets it 
right.   If your HTML file had been encoded in Latin 1, my TextEdit 
would be trying to use the default UTF-8 and when it got to that 
character it would have said "Cannot open file.  Not UTF-8"

And then I would use the File->Open command to specify Latin 1
(always my second choice because ASCII is a subset of UTF-8 and
Latin 1 is the most common non-ASCII format.  And since Latin 1 includes 
ALL bytes, if it isn't Latin 1, but it is any eight-bit superset of 
ASCII, I can see a good part of it correctly.

-- 
Wes Groleau

"What progress we are making!  In the Middle Ages, they would have
burnt me; nowadays they are content with burning my books.”
                                     — Sigmund Freud, 1933
"He was never to know that even that was only an illusory progress,
that ten years later they would have burned his body as well.”
                                     — Ernest Jones, 1953
0
news31 (6772)
12/20/2012 4:17:45 AM
On 12-19-2012 13:50, JF Mezei wrote:
> With Finder, I manually corrected the accented characters (at this stage
> of the process, the number of documents is smaller)
>
> In "Terminal.app" the folder for Vidéotron comes out as:
>
> Québecor_Média_inc
>
> But in Xterm:
>
> Que??becor_Me??dia_inc
>
> So it is probably some resource I need to set in Xterm to tell it to use
> UTF-8 by default since in its currnet state it is unable to handle it.
> Interesting that Xtrerm displays the ç as   ÃÂ  when dealing with data
> in a file, but when I manually set a file name with Finder, Xterm then
> displays the ç as ??

The shell (or 'ls') recognizes that the file name is UTF-8 and that the 
locale can't handle that.  So it substitutes question marks.  It is a 
bug however (or an artifact of an odd locale) that it puts two of them 
for a two-byte encoding of ONE character.

I have all three encodings on "open and save" prefs for TextEdit set to 
UTF-8 and "plain text" selected instead of RTF on the New Doc tab.
There may be some hidden prefs inherited from earlier versions of 
TextEdit that had many more Pref tabs.  But this works 99+% of the time 
for me.

And Terminal handles almost everything when my locale is

iMac:~ wgroleau$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Maybe that last one explains the remaining less than one percent.  :-)


-- 
Wes Groleau

   Pat's Polemics
   http://Ideas.Lang-Learn.org/barrett
0
news31 (6772)
12/20/2012 4:28:01 AM
On 12-19-2012 17:20, Lewis wrote:
> I don't do X11/xterm. It's always seemed a very limited and kludgy
> solution.

Although I'm sure that folks have made improvements over time, one might 
benefit by tempering one's expectations with the knowledge that xterm 
and X11 are older than UniCode and UTF-8.

-- 
Wes Groleau

"What progress we are making!  In the Middle Ages, they would have
burnt me; nowadays they are content with burning my books.”
                                     — Sigmund Freud, 1933
"He was never to know that even that was only an illusory progress,
that ten years later they would have burned his body as well.”
                                     — Ernest Jones, 1953
0
news31 (6772)
12/20/2012 4:34:43 AM
On 12-19-2012 14:50, Richard Kettlewell wrote:
> TextEdit stores the encoding in an extended attribute, which you can
> retrieve with xattr.
>
>    $ hexdump -C utf8.txt
>    00000000  66 69 6c 65 20 77 69 74  68 20 61 20 c2 a3 20 73  |file with a .. s|
>    00000010  69 67 6e 0a                                       |ign.|
>    00000014
>    $ hexdump -C wlatin1.txt
>    00000000  66 69 6c 65 20 77 69 74  68 20 61 20 a3 20 73 69  |file with a . si|
>    00000010  67 6e 0a                                          |gn.|
>    00000013
>    $ xattr -l utf8.txt
>    com.apple.TextEncoding: utf-8;134217984
>    $ xattr -l wlatin1.txt
>    com.apple.TextEncoding: windows-1252;1280

This is nice for me to know.  I guess I can modify my earlier post to 
say "If the xattr is not present, TextEdit will use the 'file' method of 
guessing...."

> (The decimal value is an OSX-specific identifier for the encoding.)
>
> But this is not a general answer; many other programs will neither set
> the attribute nor pay any attention to it.  For example, most Unix
> programs will either ignore the question entirely or assume that
> encoding implied by the LC_CTYPE locale setting holds for all files.

And it is reasonable to assume that a program that ignores it won't set 
it either.

-- 
Wes Groleau

    ¡Qué quiero realmente hacer es comer un perrito caliente!
       私が実際にしたいと思う何をホットドッグを食べることである!
    http://Ideas.Lang-Learn.org/WWW?itemid=463
0
news31 (6772)
12/20/2012 4:38:31 AM
On 12-12-19 20:35, billy@MIX.COM wrote:
> Here's a handy web test page I stumbled upon today -
> 
> UTF-8 Sampler
> http://www.columbia.edu/~fdc/utf8/

hey ! I had dealing with Frank da Cruz back in the heydays of Kermit on
VMS ! Very smart and nice fella !

For displaying the fonts on screen, my guess is that if I have latin-1
encoded fonts, they may have problems displaying accented character that
are UTF-8.

However, that doesn't explain PHP getting UTF-8 data from the net and
not handling it as UTF-8 when processing.


0
12/20/2012 5:04:25 AM
In message <kau4h5$ak6$1@dont-email.me> 
  Wes Groleau <Groleau+news@FreeShell.org> wrote:
> On 12-19-2012 17:20, Lewis wrote:
>> I don't do X11/xterm. It's always seemed a very limited and kludgy
>> solution.

> Although I'm sure that folks have made improvements over time, one might 
> benefit by tempering one's expectations with the knowledge that xterm 
> and X11 are older than UniCode and UTF-8.

True, but most the software I use is older than Unicode and UTF-8 and
has managed to make the switch. Apache, slrn, php, perl, etc.

-- 
'Why don't I feel angry?' GLANDS, said Death shortly. ADRENALIN AND SO
FORTH. AND EMOTIONS. YOU DON'T HAVE THEM. ALL YOU HAVE NOW IS THOUGHT.
0
g.kreme (3671)
12/20/2012 8:56:29 AM
In message <katq1o$o64$1@reader1.panix.com> 
  billy@MIX.COM <billy@MIX.COM> wrote:
> Here's a handy web test page I stumbled upon today -

> UTF-8 Sampler
> http://www.columbia.edu/~fdc/utf8/

That's cool. I seem to be missing any runic fonts. Other than that, it
looks like everything displayed (10.8.2, Safari 6.0.2). Oddly, I *do*
have a Runic.TTF font installed, but the runes on that page don't show.

> And to fill some of what Apple doesn't provide (the
> author's web site is not responding for me now) -

> Code2000 Font
> http://www.fonts2u.com/code2000.font

I think that might be where I got my Runic.TTF font, now that I think about it.

<http://www.fonts2u.com/runic-regular.font>

Looks exactly like the one I have installed.

> Not that any of this solves the OP's problem, but, hey...

Details!

-- 
Hudd: 'I've just done this radio show where I never met any of the other
actors and I didn't understand what any of it was about' Moore: 'Ah, yes
I expect that's the thing I'm in.'
0
g.kreme (3671)
12/20/2012 9:04:49 AM
In article <katq1o$o64$1@reader1.panix.com>, billy@MIX.COM wrote:

> Here's a handy web test page I stumbled upon today -
> 
> UTF-8 Sampler
> http://www.columbia.edu/~fdc/utf8/
> 
> And to fill some of what Apple doesn't provide (the
> author's web site is not responding for me now) -
> 
> Code2000 Font
> http://www.fonts2u.com/code2000.font

Or this one:

<http://www.utf8-chartable.de>

-- 
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted"  --  Bill of Rights 1689
0
timstreater2 (1190)
12/20/2012 10:34:04 AM
In article <50d29c5b$0$1214$c3e8da3$f017e9df@news.astraweb.com>,
 JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 12-12-19 20:35, billy@MIX.COM wrote:
> > Here's a handy web test page I stumbled upon today -
> > 
> > UTF-8 Sampler
> > http://www.columbia.edu/~fdc/utf8/
> 
> hey ! I had dealing with Frank da Cruz back in the heydays of Kermit on
> VMS ! Very smart and nice fella !
> 
> For displaying the fonts on screen, my guess is that if I have latin-1
> encoded fonts, they may have problems displaying accented character that
> are UTF-8.
> 
> However, that doesn't explain PHP getting UTF-8 data from the net and
> not handling it as UTF-8 when processing.

PHP doesn't do anything with it. It's a byte stream.

-- 
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted"  --  Bill of Rights 1689
0
timstreater2 (1190)
12/20/2012 10:34:42 AM
On 12-20-2012 05:34, Tim Streater wrote:
> JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
>> However, that doesn't explain PHP getting UTF-8 data from the net and
>> not handling it as UTF-8 when processing.
>
> PHP doesn't do anything with it. It's a byte stream.

That's an over-simplification.  In PHP, perl, and almost any other 
language today, operations on strings or characters know what a 
character is according to the encoding rule that are using.

They know that the length of "ᄑ∢㌳䑄啕晦睷袈香" is eight, but the size 
is 28 bytes.


-- 
Wes Groleau

    After the christening of his baby brother in church, Jason sobbed
    all the way home in the back seat of the car.  His father asked him
    three times what was wrong.  Finally, the boy replied, “That preacher
    said he wanted us brought up in a Christian home, and I wanted to
    stay with you guys."
0
news31 (6772)
12/21/2012 12:26:25 AM
Reply:

Similar Artilces:

Are Mac OS X 10.5.8's iLife programs safe to use in Mac OS X 10.7.x and 10.8.x?
Hi. Someone told me that Mac OS X 10.7.x and 10.8.x do not come with iLife like the older Mac OS X versions (e.g., 10.5.x). I did not know this! Since my client uses iPhoto that came preinstalled on his old 2008 MacBook Pro's Mac OS X 10.5.x (10.5.8 right now), can he use the old one from 10.5.8? Or will he need a third party replacement (needs to import/copy the old image files) or buy a new iPhoto version for his photo(graph)s? I recalled he did not like iPhoto and wonder if the new one is any better. Thank you in advance. :) -- Quote of the Week: "Every ruler...

OS Smackdown: Linux vs. Mac OS X vs. Vista vs. XP
OS Smackdown: Linux vs. Mac OS X vs. Vista vs. XP Michael DeAgonia, Preston Gralla, David Ramel and James Turner, Computerworld http://www.pcworld.com/businesscenter/article/147262/os_smackdown_linux_vs_mac_os_x_vs_vista_vs_xp.html In comp.os.linux.advocacy, Ablang <ron916@gmail.com> wrote on Fri, 20 Jun 2008 19:48:30 -0700 (PDT) <1fc9ff8c-d6e9-486b-956b-a34f943d8699@m3g2000hsc.googlegroups.com>: > OS Smackdown: Linux vs. Mac OS X vs. Vista vs. XP > > Michael DeAgonia, Preston Gralla, David Ramel and James Turner, > Computerworld > > http://www.pcworld.com/bu...

Mac os 9 Vs. Mac os X
I am porting some windows software to mac os 9. My client has only mac os 9. I would like to use the URLAccessLib for my development. I find no documentation of it on Apple websites as if mac os 9 has fallen off the earth for them. The apple site says One can develop on mac os x and it is backward compatible. What does this mean ? When I install my code on mac os 9 will I need all the mac os x libraries ? can someone throw somelight ? Thanks On 21 Nov 2003, dharmesh wrote: > I am porting some windows software to mac os 9. My client has only mac > os 9. I would like to use the URLAccessLib for my development. I find > no documentation of it on Apple websites as if mac os 9 has fallen off > the earth for them. Mac OS 8/9 documentation is at http://developer.apple.com/documentation/macos8/mac8.html (it's in the legacy documentation section) > > The apple site says One can develop on mac os x and it is backward > compatible. What does this mean ? When I install my code on mac os 9 > will I need all the mac os x libraries ? If your application is carbonised (ie you link against CarbonLib instead of InterfaceLib & co) then the same binary will run on OS X and OS 9 Fred I was trying to reference the CarbonLib from visual basic...I was succesful with Interacelib but couldn't access CarbonLib. I dont understand why. The system seems to have CarbonLib 1.4. Thanks for the reply Frederick Cheung <f...

Mac os 9 Vs. Mac os X
I am porting some windows software to mac os 9. My client has only mac os 9. I would like to use the URLAccessLib for my development. I find no documentation of it on Apple websites as if mac os 9 has fallen off the earth for them. The apple site says One can develop on mac os x and it is backward compatible. What does this mean ? When I install my code on mac os 9 will I need all the mac os x libraries ? can someone throw somelight ? Thanks >The apple site says One can develop on mac os x and it is backward >compatible. What does this mean ? When I install my code on mac os 9 >wi...

Mac file dialog extensions [2.8.1, Mac OS-X, xCode 2.4.1]
------_=_NextPart_001_01C755D2.E5708505 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Looking at the code in Mac/Carbon/FileDlg.cpp, at NavEventProc, in response to the kNavCBPopupMenuSelect event, I noticed that the code takes the LAST extension from the filter and attaches it to the file. I find this a bit strange, and I would expect the FIRST extension of each file type to be the most dominant one, so that it's used as the default extension if none is supplied. Is there any reason for using the last extension (o...

RE: Mac file dialog extensions [2.8.1, Mac OS-X, xCode 2.4.1]
OK, I will fix this, and send a patch. > -----Original Message----- > From: Vadim Zeitlin [mailto:vadim@wxwindows.org] > Sent: Sunday, March 04, 2007 11:38 PM > To: wx-users@lists.wxwidgets.org > Subject: Re: Mac file dialog extensions [2.8.1, Mac OS-X, xCode 2.4.1] >=20 > On Wed, 21 Feb 2007 18:11:23 +0200 Yaron Tadmor <YaronT@HumanEyes.com> > wrote: >=20 > YT> Looking at the code in Mac/Carbon/FileDlg.cpp, at NavEventProc, in > YT> response to the kNavCBPopupMenuSelect event, I noticed that the code > YT> takes the LAST extens...

[ANN] Graphviz for Mac OS X 1.12 (v12) [ANN] Graphviz for Mac OS X 1.12 (v8) [ANN] Graphviz for Mac OS X 1.12 (v8)
Dear All: Them pesky bugs. A few more squashed courtesy of the sleepy pixel. http://www.pixelglow.com/graphviz/ What's new in v11 ------------ Fixed some comprehensive help [NRi]. Fixed scale option placeholder [NRi]. Improved application and document icons. Example files now double-click to open in application. What's new in v12 ------------ Added layout option tooltips [NRi]. Fixed layout popup button changing wrong graph [MKe]. Clicking on warning icon now opens Activity window [NRi, AM]. Revert menu item now disabled. Cheers, Glen Low --- pixelglow software | simply brilliant stuff www.pixelglow.com ...

[ANN] Graphviz for Mac OS X 1.12 (v8) [ANN] Graphviz for Mac OS X 1.12 (v8) [ANN] Graphviz for Mac OS X 1.12 (v8)
Hi all, It's been a busy week or two at Pixelglow Software. Here's a brand new version of Graphviz, all spit and polish now. You'll enjoy the integrated color and font panel support, hand cursor panning and remembered settings. And everyone's most asked for -- a single click on the Edit tool will now bring up the DOT code for you to edit, and of course when you save it the graph automatically re-renders. http://www.pixelglow.com/graphviz/download/ Here's the lowdown: Added edit, render and stop toolbar items [PCh]. Added integrated font and color fields and panels. Added hand cursor panning [AM]. Added autocomplete for most fields [NRi]. Added support for user defaults ("Remember Settings" menu command, command line arguments to GUI) [JSc, RPa]. Fixed click on popup menu unexpectedly selecting "..." [AM]. Fixed small zooms sometimes preventing full scrolling [DJu]. Fixed allowing fonts starting with "." to be selected [NRi]. Fixed transparent backgrounds rendering opaque in bitmap output. Fixed width or height > 32767 pixels unexpectedly cropping bitmap output [AM]. Improved toolbar interaction. Improved settings descriptions and tooltips. Cheers, Glen Low --- pixelglow software | simply brilliant stuff www.pixelglow.com ...

[ANN] Graphviz for Mac OS X 1.12 (v10) [ANN] Graphviz for Mac OS X 1.12 (v8) [ANN] Graphviz for Mac OS X 1.12 (v8)
Hi all, Yet another Graphviz version. The old application icon had been voted off the island, and brand new application and document icons flown in for the task. Comprehensive help features in this version too. http://www.pixelglow.com/graphviz/ What's new: ------------ Added new application and document icons. Added comprehensive help. Fixed changes not affecting graph size displaying incorrectly [BTr]. Fixed export then close crashing the export of an open window. Improved shadowed frame for graph [NRi]. Improved status display [NRi]. Changed sources to pure BSD license. Cheers, Glen Low --- pixelglow software | simply brilliant stuff www.pixelglow.com ...

[ANN] Graphviz for Mac OS X 1.13 (v13) [ANN] Graphviz for Mac OS X 1.12 (v8) [ANN] Graphviz for Mac OS X 1.12 (v8)
Dear All, I've just released the newest version of Mac Graphviz, featuring shapefile support and enhanced zoom. http://www.pixelglow.com/graphviz/ Shapefiles supported include PDF, EPS, PS, JPEG, PNG and all Quicktime formats. Either specify an absolute or relative (to working directory) file path, or a URL using the shapefile attribute. Developers can now also use the graphviz.framework directly with #include headers in C e.g. using Xcode "Add Frameworks..."; documentation is available from the main Graphviz site -- http://www.research.att.com/sw/tools/graphviz/libguide.pdf Changes ------- Added drawer and zoom menu commands [DWa, NRi]. Added intelligent window zooming [NRi]. Added shapefile support. Added UTF-8 support [RSc]. Added cvtgxl, gvpack and gvpr tools [BSw]. Fixed page setup then close unexpectedly invoking save dialog [PRo]. HTML-like labels now work on 10.2 (use embedded expat instead of libxml2). Improved internal frameworks (added headers, consolidated dylibs). Tracked main build of 23 June. Graphviz is still free, but I'm now accepting donations for it. Enjoy! Cheers, Glen Low --- pixelglow software | simply brilliant stuff www.pixelglow.com ...

UTF-8 vs ISO-8859-1
Been trying to research this, but not getting any solid answers. So I ask the opinions of the esteemed panel. Assuming one is authoring a page in English for a primarily US audience, what are the benefits and drawbacks of using UTF-8, ISO-8859-1, or other systems? What is recommended? Neal wrote: > Been trying to research this, but not getting any solid answers. > > Assuming one is authoring a page in English for a primarily US > audience, what are the benefits and drawbacks of using UTF-8, > ISO-8859-1, or other systems? What is recommended? There are numer...

Mac OS/X file system support?
I have a Mac OS/X formatted hard drive that I'd like to read and delete some stuff off of under Linux. Is there a Mac OS/X file system driver for Linux? On Wed, 07 Jun 2006 03:49:20 -0700, stork wrote: > I have a Mac OS/X formatted hard drive that I'd like to read and delete > some stuff off of under Linux. Is there a Mac OS/X file system driver > for Linux? > Caveat: I don't have a Mac. IIRC, most kernels can read/write HFS. # modprobe hfs # lsmod | grep hfs -- Douglas Mayne On Wed, 07 Jun 2006 08:10:17 -0600, Douglas Mayne staggered into the Black Sun and s...

Mac OS X file attached as TEXT ?
Using Eudora 6 under Mac OS 10.2, I attached a file (Appledouble MIME). It was a text file, with extension .ltx but according to my recipient, it was sent as a binary file, so the result had only Mac line-endings CR in it. In Mac OS X, where files do not always have filetypes any more, and the Mac has to rely on the extension, how do I tell Eudora which files should be sent as TEXT? -- G. A. Edgar http://www.math.ohio-state.edu/~edgar/ In article <221020030750325375%ydp4fdr6202@sneakemail.com>, G. A. Edgar <ydp4fdr6202@sneakemail.com> ...

Problem with Helvetica font on Mac OS X/Mac OS 9 System
I'm working with a Mac which has Mac OS X and Mac OS 9.2 running on top of the X. I use MS Word and Pagemaker and there's a problem with helvetica italic not displaying consistently. I mean, without any intervention, some days it's visible, other days it's not. I thought maybe it was due to not having the font file, but it seems we do have it. I don't know if this is related, but Adobe Type Manager has also popped up lately complaining about not having enough cache memory. I looked at the setting and it was 2MB already, but increased it to 3 MB yesterday. It seems to have...

PDF to EPS workarounds for Mac OS X Attention TeXy, TeXy peoples: PDF to EPS workarounds for Mac OS X Attention TeXy, TeXy peoples: PDF to EPS workarounds for Mac OS X Attention TeXy, TeXy
Attention TeXy, TeXy peoples: (excuse the mispronunciation; punny licence) It seems that Mac OS X has partially documented issues (http://altair.ific.uv.es/~JaxoDraw/Bugparade/bugparade.htm, http://developer.apple.com/java/faq/issues.html#anchor6) with the generation of EPS, and I'm trying to find a workaround. The Preview application cannot export to eps, but it can read it (converting it to PDF). If I use Adobe Acrobat to read the PDF generated by Preview, the EPS it generates sometimes (inconsistently) causes dvips to make an unreadable file. What I'm trying to do is the followi...

How convert text file between locale encoding and UTF-8?
Dear Friends: Wondering that is there neat way to do "subject line" in Python? I am talking about Python 2.4 with Win32 extension installed. The locale can be any of ANSI defined, for example, zh_CN (CP936) or Korea (CP949) .......... I am not expert in Python, if you well note I will appreciate a lot. Rgds, David Xiao davihigh@gmail.com wrote: > Dear Friends: > > Wondering that is there neat way to do "subject line" in Python? I am > talking about Python 2.4 with Win32 extension installed. The locale can > be any of ANSI defined, for example, zh_CN (CP...

How do i decode an ISO-8859-1 encoded text file?
Hi, I want to decode the content of the text file which looks like this: Subject: =?iso-8859-1?Q?Sek=2Dm=F6te?= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Vill bara s=E4ga att du hade r=E4tt. I tried to do it like this: import java.io.*; import java.util.*; import java.net.*; import java.text.SimpleDateFormat; public class MessageHandler { BufferedReader _reader = null; /** * Constructor */ public MessageHandler(String msgFile){ try { _reader = new BufferedReader(new InputStreamReader(new FileInputStream(msgFile), "ISO-8...

How do I create a new text file with utf-8 encoding
I use Activeperl version 5.8.8.817 on windows xp. I try create a new text file and add some content but when I open it in notepad, it says its a ansi encoded file. Why? Here is my code snippit: open my $fh, '>:encoding(UTF-8)', "testfile.txt"; print $fh "Welcome to Muppet Show\n"; close $fh; What do I do wrong? bk@docstream.no wrote: > I use Activeperl version 5.8.8.817 on windows xp. > > I try create a new text file and add some content but when I open it > in notepad, it says its a ansi encoded file. Why? > > open my $fh, '>:encod...

Transformer encoding not working for ISO-8859-1 only for UTF-8
I have a problem when transforming text containing the swedish letters "=E5", "=E4" and "=F6". If I do Transformer t =3DTransformerFactory.newInstance().newTransformer(); t=2EsetOutputProperty( OutputKeys.METHOD, "xml"); t=2EsetOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); t=2EsetOutputProperty( OutputKeys.INDENT, "yes"); t=2EsetOutputProperty( OutputKeys.ENCODING, "ISO-8859-1"); <------- * t=2Etransform( new DOMSource( document), new StreamResult( output ) ); return output.toString( ); I...

Mac OS 9.x included in Mac OS X?
Hi, I read that Mac OS X has a so-called classic mode which emulates Mac OS 9.x so that older applications can still be run. Provided one is not an upgrader from Mac OS 9.x, does Mac OS X include Mac OS 9.x required for the classic mode or does it have to be bought separately? Peter >Provided one is not an upgrader from Mac OS 9.x, does Mac OS X include >Mac OS 9.x required for the classic mode or does it have to be bought >separately? If you buy the installer for OS X it *does not* include OS 9. You generally use the copy of OS 9 that came with your computer in order to install C...

Mac OS X & Mac OS X Server
Hi, I'm currently using a 9i developer release on Mac OS X (10.3) which has proved very stable so far. My understanding of the various Oracle press releases is that 10G will be released for Mac OS X, are there any beta testers out there wiling to comment on availabilitu/quality/performance issues for 10G on Mac OS X? Another question is whether tools such as the OEM will be available in 10G for Mac OS X, does anybody know? Yours in anticipation! Steve Steve <steve@nospam.com> wrote in message news:<2004013008345616807%steve@nospamcom>... > releases is that 10G wil...

Windows 8 vs. Mac os x Lion boot time
Talk about s-l-o-o-o-o-o-w. I thought the guy forgot to push the power button. In fact, he booted the Acer a couple of seconds later!!!! http://www.youtube.com/watch?v=xyqkpXzHZmo On Saturday, May 19, 2012 9:52:21 AM UTC-4, james wrote: > Talk about s-l-o-o-o-o-o-w. I thought the guy forgot to push the > power button. > > In fact, he booted the Acer a couple of seconds later!!!! > > http://www.youtube.com/watch?v=xyqkpXzHZmo Macs have always had sub-standard slow hardware. It's just one of the reasons why Macs are banned from the workplace/enterprise. In article <hANtr.9461$br3.423@newsfe10.iad>, "james" <james@aol.com> wrote: > Talk about s-l-o-o-o-o-o-w. I thought the guy forgot to push the > power button. > > In fact, he booted the Acer a couple of seconds later!!!! > > http://www.youtube.com/watch?v=xyqkpXzHZmo I suppose that boot times are important to you windows guys, since it seems you have to do it so often. My iMac boots OSX Lion fast enough for those very rare times I ever need to do it. Like an OS update or a power failure. Other than that, never reboot or boot cold. Must suck to live with a POS that needs to be quick to boot to make you not hate having to boot all the time! In article <db439867-a4a1-4814-88f4-4993abc90b25@googlegroups.com>, MuahMan <muahman@gmail.com> wrote: > On Saturday, May 19, 2012 9:52:21 AM UTC-4...

Your Mac won't start up in Mac OS X (Mac OS X 10.3.9 or earlier)
Your Mac won't start up in Mac OS X (Mac OS X 10.3.9 or earlier) Nothing can be more frustrating than turning on your Mac only to find that it won't start up. Instead of seeing the Finder, you see a blue or gray screen, an icon of a broken folder, a kernel panic, a flashing question mark, or a computer that just sits there. What can you do? Don't worry. It could be a simple issue that you can fix yourself. Note: This article applies to Mac OS X 10.3.9 or earlier. Tip: If your computer won't start at all, skip to "You see a blank, gray screen" below. The first step to help your Mac start up again is to identify which symptom you see. Once you know what the symptom is, you can try to fix it. Here's a list of the most common things you might see if your Mac turns on but doesn't start up. Click the link for the symptom you see, then follow the steps to fix it. You see an empty, blue screen. You might also see a progress indicator, which looks like a colored pinwheel or spinning disc. A "broken folder" icon, a prohibitory sign, or "kernel panic" message appears. Sam Walker Apple Specialist You see a blank, gray screen. A flashing question mark appears. None of the above happens, but your Mac doesn't start up. You see an empty, blue screen. You might also see a progress indicator, which looks like a colored pinwheel or spinning disc There are several different things you can try to fix this symptom. Go through eac...

UTF-8 BOM w/ ISO-8859-1 encoding pseudo attribute
I have an XML document that includeds a UTF-8 BOM (0xEF 0xBB 0xBF). The document is properly encoded as UTF-8. However the XMLDecl encoding pseudo attribute indicates 'ISO-8859-1'. So how SHOULD the XML processor handle this? Is it a fatal error? Clearly it cannot be processed as ISO-8859-1 because the content is scrambled. It appears the Xerces is parsing it according to ISo-8859-1 even though the BOM is there. Hmm. Any sugestions? How should one handle this case? Erik In article <eb9cae13.0408181038.369e7977@posting.google.com>, Erik Wahlstrom <erik.wahlstrom@gmail.c...

Web resources about - Text file encodings in OS-X (ISO Latin1 8859 vs UTF-8) - comp.sys.mac.system

Encoding (memory) - Wikipedia, the free encyclopedia
Visual, acoustic, and semantic encodings are the most intensively used. Other encodings are also used. Acoustic encoding is the encoding of auditory ...

Twitter image encoding challenge
If a picture's worth 1000 words, how much of a picture can you fit in 140 characters? Note : That's it folks! Bounty deadline is here, and after ...

【medical-news】Genetic Variation in NR1H4 Encoding the Bile Acid Receptor FXR - 医药生命科学动态跟踪 -丁香园论坛
Context: Bile acid signaling via farnesoid X receptor (FXR) regulates glucose and lipid levels, fat mass, and hepatic steatosis in animal models.Objective: ...

HandBrake Open Source video transcoder v0.10 released with hundreds of new features including H.265 and ...
... can be used for transcribing many different types of files/codecs to almost any other. Today’s headliner updates include H.265 and VP8 encoding. ...

CJK Type - CJK Fonts, Character Sets & Encodings. All CJK. All of the time.
As I wrote nearly a year ago , the Adobe-Identity-0 ROS is useful for building special-purpose fonts, especially CJK ones whose glyph coverage ...

Link Encoding Goes Mobile With Deep Links From Bitly
... Facebook and Google have been competing to bring the best solution for tracking deeplinks . This week, Bitly announced its own linking encoding ...

Encoding Articles - AppAdvice iPhone/iPad News
Latest Encoding Articles - AppAdvice iPhone/iPad News

AirMovie - Enjoy the videos in your PC anytime, anywhere with NO ENCODING!!
Holen Sie sich „AirMovie - Enjoy the videos in your PC anytime, anywhere with NO ENCODING!!“ im App Store. Sehen Sie sich Screenshots, Bewertungen ...

More tips on encoding video for Apple TV and iPod, from us to you
Apple tells video podcasters how to encode their content, which is also useful …

Handbrake 0.9.6 gives some, takes some encoding features
The Handbrake Project has announced an update to its open-source, cross-platform video transcoding utility. Handbrake 0.9.6 includes new and ...

Resources last updated: 3/10/2016 6:45:10 PM