I use Activeperl version 5.8.8.817 on windows xp. I try create a new text file and add some content but when I open it in notepad, it says its a ansi encoded file. Why? Here is my code snippit: open my $fh, '>:encoding(UTF-8)', "testfile.txt"; print $fh "Welcome to Muppet Show\n"; close $fh; What do I do wrong?
bk@docstream.no wrote: > I use Activeperl version 5.8.8.817 on windows xp. > > I try create a new text file and add some content but when I open it > in notepad, it says its a ansi encoded file. Why? > > open my $fh, '>:encoding(UTF-8)', "testfile.txt"; > print $fh "Welcome to Muppet Show\n"; > close $fh; > > What do I do wrong? Your sample text has the identical byte sequence in ASCII, Windows-1252 (aka ANSI), UTF-8, ISO-Latin1, ISO-Latin15, and probably a dozen other encodings. Therefore your sample is useless for testing for the correct encoding. Notepad relies on the byte order mark (BOM) do identify Unicode files, including UTF-8 where the BOM of course is meaningless and not used except by Notepad itself. In not so many words: Notepad has no clue what it is talking about. But for your sample text nor would any other tool. Step 1: use some sample text that contains characters, that have different code points in each encoding. Step 2: don't use Notepad. Write to a (trivial) HTML file and then use a web browser to view that file. There you can change the encoding and determine, if those characters are displayed correctly for the desired encoding. In over 8 years as software localization engineer and international program manager this has proven to be the only practical and reliable way to identify the actual encoding of a file. jue
![]() |
0 |
![]() |
On May 10, 3:05 pm, "J=FCrgen Exner" <jurge...@hotmail.com> wrote: > b...@docstream.no wrote: > > I use Activeperl version 5.8.8.817 on windows xp. > > > I try create a new text file and add some content but when I open it > > in notepad, it says its a ansi encoded file. Why? > > > open my $fh, '>:encoding(UTF-8)', "testfile.txt"; > > print $fh "Welcome to Muppet Show\n"; > > close $fh; > > > What do I do wrong? > > Your sample text has the identical byte sequence in ASCII, Windows-1252 (= aka > ANSI), UTF-8, ISO-Latin1, ISO-Latin15, and probably a dozen other encodin= gs. > Therefore your sample is useless for testing for the correct encoding. > > Notepad relies on the byte order mark (BOM) do identify Unicode files, > including UTF-8 where the BOM of course is meaningless and not used except > by Notepad itself. You mean Windows not Notepad. Most Windows programs will recognise a file with a utf8 BOM at the start as utf8. In a situation where you've got a mixture of Windows-1252 and utf8 files knocking about then it's not a bad way to distinguish them. I'm not saying I particularly liked Microsoft's unilateral adoption of BOM in utf8 but I have to admit it makes the best of a bad job. In Perl I'd like to be able to say something like open my $fh, '>:encoding(UTF-8 BOM)', "testfile.txt"; But AFIAK I can't and I just have to print $fh "\x{FEFF}"; # BOM
![]() |
0 |
![]() |
Brian McCauley wrote: > In a situation where you've got a mixture of Windows-1252 and utf8 > files knocking about then it's not a bad way to distinguish them. I'm > not saying I particularly liked Microsoft's unilateral adoption of BOM > in utf8 but I have to admit it makes the best of a bad job. Fair enough, you got a point. However calling it a _Byte_Order_ Mark in context of UTF-8 is a misnomer if there ever has been one ;-) jue
![]() |
0 |
![]() |