Encoding question

  • Follow


Hi,
I have a text based application and want to draw some kind of frame, on
the screen. OS is Debian/Linux using Perl 5.6

I'm using this code:

-- snip ---
  my $top = chr(201);
  my $bottom = chr(200);
  for (my $i = 0; $i < ($termCols-2); $i++)
  {
    $top .= chr(205);
    $bottom .= chr(205);
  }
  $top .= chr(187);
  $bottom .= chr(188);

  $term->Tgoto('cm', 0, 0, *STDOUT);
  print $top;
  for (my $i = 1; $i < ($termRows-1); $i++)
  {
    $term->Tgoto('cm', 0, $i, *STDOUT);
    print chr(186);
    $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
    print chr(186);
  }
  $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
  print $bottom;
-- snip --

Where $termCols and $termRows are the current terminal lines and columns.

Problem:
Due to the encoding to latin-1 charset I didn't get the expected
frame-symbols but some other accentuated(?) chars.

How can I change the encoding that I can use the extended ASCII set, which
is referred often as the most common e.g. on www.asciitable.com, which
contains these frame-symbols?
I'm aware of 'use encoding "..";' but I just can't find the correct table. :(

michael

0
Reply Michael 6/18/2004 12:39:05 PM

Quoth Michael Krueger <kruger@math.fu-berlin.de>:
> Hi,
> I have a text based application and want to draw some kind of frame, on
> the screen. OS is Debian/Linux using Perl 5.6
> 
> I'm using this code:
> 
> -- snip ---
>   my $top = chr(201);
>   my $bottom = chr(200);
>   for (my $i = 0; $i < ($termCols-2); $i++)

for my $i (0 .. ($termCols-2)) {

is much more Perlish...

>   {
>     $top .= chr(205);
>     $bottom .= chr(205);
>   }
>   $top .= chr(187);
>   $bottom .= chr(188);

....but even more so would be

my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);

>   $term->Tgoto('cm', 0, 0, *STDOUT);

I'm not sure which class these methods are from, but you might consider
using Term::ANSIScreen instead...

>   print $top;
>   for (my $i = 1; $i < ($termRows-1); $i++)
>   {
>     $term->Tgoto('cm', 0, $i, *STDOUT);
>     print chr(186);
>     $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
>     print chr(186);
>   }
>   $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
>   print $bottom;
> -- snip --
> 
> Where $termCols and $termRows are the current terminal lines and columns.
> 
> Problem:
> Due to the encoding to latin-1 charset I didn't get the expected
> frame-symbols but some other accentuated(?) chars.

The first thing to say is that if you want to mess with encodings,
upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
other encodings. The encoding pragma you mention only works in 5.8 (and
doesn't do what I think you think it does: it changes the encoding your
*program source* is considered to be in: i.e. the encoding of string
literals in the source).

There are, potentially, three encodings in use here: the one perl uses
to convert the numbers in your source into characters, the one perl uses
to convert the characters back to numbers again to send to the terminal,
and the one the terminal uses to decide which glyph to draw.

An easy and straightforward way to get rid of the first is the use
"\N{...}" instead of chr, and look up the correct characters in the Big
Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
the second using the :encoding layer on filehandles: see perldoc -f
binmode, perldoc PerlIO::encoding.

The third is I think your problem here: your terminal is expecting
Latin-1 (entirely usual in the Unix world) and there are no box drawing
characters in Latin-1. Your best answer is to persuade your terminal to
want utf8 instead (unicode_start on the console, xterm -u8, most other
terminal emulators will support it with an option); then you can call
binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.

Ben

-- 
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t    # ben@morrow.me.uk
$J::u::s::t, $a::n::o::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
0
Reply Ben 6/18/2004 12:59:05 PM


Michael Krueger wrote:

> Hi,
> I have a text based application and want to draw some kind of frame, on
> the screen. OS is Debian/Linux using Perl 5.6
> 
> I'm using this code:
> 
> -- snip ---
>   my $top = chr(201);
>   my $bottom = chr(200);
>   for (my $i = 0; $i < ($termCols-2); $i++)
>   {
>     $top .= chr(205);
>     $bottom .= chr(205);
>   }
>   $top .= chr(187);
>   $bottom .= chr(188);
> 
>   $term->Tgoto('cm', 0, 0, *STDOUT);
>   print $top;
>   for (my $i = 1; $i < ($termRows-1); $i++)
>   {
>     $term->Tgoto('cm', 0, $i, *STDOUT);
>     print chr(186);
>     $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
>     print chr(186);
>   }
>   $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
>   print $bottom;
> -- snip --
> 
> Where $termCols and $termRows are the current terminal lines and columns.
> 
> Problem:
> Due to the encoding to latin-1 charset I didn't get the expected
> frame-symbols but some other accentuated(?) chars.
> 
> How can I change the encoding that I can use the extended ASCII set, which
> is referred often as the most common e.g. on www.asciitable.com, which
> contains these frame-symbols?

The code set is probably "Code Page 437" variously referred to as 
"cp437", "IBM437", "437" etc. There are national variants too which have 
some or all of the same line-draw characters but include a few accented 
characters or national currency symbols in place of some US characters.

All those line-draw characters are also in Unicode - this and UTF-8 may 
be a better option.

See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT

Vim supports editing of unicode characters in UTF-8 files, e.g. ISTR 
Control-K dr produces a top-left corner (mnemonic down, right) Control-K 
vv produces a vertical-line and so on.

> I'm aware of 'use encoding "..";' but I just can't find the correct table. :(
> 

Googling reveals snippets such as
   binmode (STDOUT, ':encoding(cp437)');

You need to match encodings with your display device, on a Linux console 
you probably need to check the "locale" settings (LANG etc) and some 
other stuff.

If using a terminal emulator you need to choose an appropriate font. On 
Windows that might be "Terminal" for IBM437 or "Courier New" for Unicode.

0
Reply Ian 6/18/2004 3:32:58 PM

On Fri, 18 Jun 2004, Ian Wilson wrote:

> The code set is probably "Code Page 437" variously referred to as
> "cp437", "IBM437", "437" etc. There are national variants too

Er, excuse me, but cp437 -is- the national (USA) variant.  The Latin
multilingual codepage is cp850.

> All those line-draw characters are also in Unicode - this and UTF-8 may
> be a better option.

By now I'm sure that's the best advice, unless there are some special
factors involved.

0
Reply Alan 6/18/2004 3:58:56 PM

Alan J. Flavell wrote:

> On Fri, 18 Jun 2004, Ian Wilson wrote:
> 
> 
>>The code set is probably "Code Page 437" variously referred to as
>>"cp437", "IBM437", "437" etc. There are national variants too
> 
> 
> Er, excuse me, but cp437 -is- the national (USA) variant.  

Picky, but also wrong :-)
in my post s/there are national/there are other national/

in your post s/the national variant/a national variant/
(at least from where I'm standing, YMMV)


> The Latin multilingual codepage is cp850.

Alright but the OP referred to http://www.asciitable.com/ which shows 
CP437.

I haven't checked every codepoint in the bit described as "Extended 
ASCII" but point 184 looks to me like 437 rather than 850. I can't say I 
like that page much anyhow.



0
Reply Ian 6/18/2004 4:41:01 PM

On Fri, 18 Jun 2004, Ian Wilson wrote:

> >>The code set is probably "Code Page 437" variously referred to as
> >>"cp437", "IBM437", "437" etc. There are national variants too
> >
> > Er, excuse me, but cp437 -is- the national (USA) variant.
>
> Picky, but also wrong :-)
> in my post s/there are national/there are other national/

Fine, I'll go with that...

> in your post s/the national variant/a national variant/

Rather, s/the national (USA) variant/the USA national variant/
, to address your nitpick in the way that I had intended.

Way back (e.g this old MS-DOS 5 manual which I have on the shelf),
cp437 was advertised as the "English" code page; but already by the
time of the public release of Win95 (as opposed to the beta, where I
had chosen to change the codepage to 850 for myself, despite the dire
warnings in the covering notes), MS were setting the DOS codepage as
cp850 for Latin-based locales.  As far as I know (though I could be
wrong) they were still setting cp437 in the USA, though.

> > The Latin multilingual codepage is cp850.
>
> Alright but the OP referred to http://www.asciitable.com/ which shows
> CP437.

Sure, I wasn't arguing about that part of the posting.

> I haven't checked every codepoint in the bit described as "Extended
> ASCII"

....a term which always sets off the bogosity alarms.  There are
*numerous* 8-bit character codings which contain ASCII as their first
half.

> but point 184 looks to me like 437 rather than 850.

Indeed.  The "Extended ASCII" bogon *does* usually refer to cp437 in
my experience.

> I can't say I like that page much anyhow.

Me too neither.  For one thing, its claim that "it took a while to get
a single standard for these extra characters" is complete nonsense.

all the best

0
Reply Alan 6/18/2004 5:25:52 PM


On Fri, 18 Jun 2004, Ben Morrow wrote:

>
> Quoth Michael Krueger <kruger@math.fu-berlin.de>:
> > Hi,
> > I have a text based application and want to draw some kind of frame, on
> > the screen. OS is Debian/Linux using Perl 5.6
> >
> > I'm using this code:
> >
> > -- snip ---
> >   my $top = chr(201);
> >   my $bottom = chr(200);
> >   for (my $i = 0; $i < ($termCols-2); $i++)
>
> for my $i (0 .. ($termCols-2)) {
>
> is much more Perlish...
>
> >   {
> >     $top .= chr(205);
> >     $bottom .= chr(205);
> >   }
> >   $top .= chr(187);
> >   $bottom .= chr(188);
>
> ...but even more so would be
>
> my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);
>
> >   $term->Tgoto('cm', 0, 0, *STDOUT);
>
> I'm not sure which class these methods are from, but you might consider
> using Term::ANSIScreen instead...
>
> >   print $top;
> >   for (my $i = 1; $i < ($termRows-1); $i++)
> >   {
> >     $term->Tgoto('cm', 0, $i, *STDOUT);
> >     print chr(186);
> >     $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
> >     print chr(186);
> >   }
> >   $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
> >   print $bottom;
> > -- snip --
> >
> > Where $termCols and $termRows are the current terminal lines and columns.
> >
> > Problem:
> > Due to the encoding to latin-1 charset I didn't get the expected
> > frame-symbols but some other accentuated(?) chars.
>
> The first thing to say is that if you want to mess with encodings,
> upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
> other encodings. The encoding pragma you mention only works in 5.8 (and
> doesn't do what I think you think it does: it changes the encoding your
> *program source* is considered to be in: i.e. the encoding of string
> literals in the source).
>
> There are, potentially, three encodings in use here: the one perl uses
> to convert the numbers in your source into characters, the one perl uses
> to convert the characters back to numbers again to send to the terminal,
> and the one the terminal uses to decide which glyph to draw.
>
> An easy and straightforward way to get rid of the first is the use
> "\N{...}" instead of chr, and look up the correct characters in the Big
> Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
> the second using the :encoding layer on filehandles: see perldoc -f
> binmode, perldoc PerlIO::encoding.
>
> The third is I think your problem here: your terminal is expecting
> Latin-1 (entirely usual in the Unix world) and there are no box drawing
> characters in Latin-1. Your best answer is to persuade your terminal to
> want utf8 instead (unicode_start on the console, xterm -u8, most other
> terminal emulators will support it with an option); then you can call
> binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.
>
> Ben
>
> --
> $.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
> $x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
> {$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t    # ben@morrow.me.uk
> $J::u::s::t, $a::n::o::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
>

Hi,
thx for your fast reply, this really helped me alot.
I'll try it with Unicode then.

Just want to draw those darn boxes ; )

michael

0
Reply Michael 6/18/2004 10:53:52 PM

Michael Krueger <kruger@math.fu-berlin.de> wrote:


> On Fri, 18 Jun 2004, Ben Morrow wrote:
> thx for your fast reply, this really helped me alot.
> I'll try it with Unicode then.

> Just want to draw those darn boxes ; )

He gave poor advice however.  Most of the interesting terminals support
line-drawing, which any termcap interface (such as the one in Perl) can
support.

The current version of ncurses is 5.4 (20040208)
There's an faq at
	http://invisible-island.net/ncurses/ncurses.faq.html

-- 
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net
0
Reply Thomas 6/19/2004 2:19:43 PM

Quoth Thomas Dickey <dickey@saltmine.radix.net>:
> Michael Krueger <kruger@math.fu-berlin.de> wrote:
> 
> 
> > On Fri, 18 Jun 2004, Ben Morrow wrote:
> > thx for your fast reply, this really helped me alot.
> > I'll try it with Unicode then.
> 
> > Just want to draw those darn boxes ; )
> 
> He gave poor advice however.  Most of the interesting terminals support
> line-drawing, which any termcap interface (such as the one in Perl) can
> support.

Ah, I didn't know that... filed for future reference. Thank you.

FWIW, I always do boxes just with '+', '-' and '|'...

Ben

-- 
perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
2047502190/'                                                 # ben@morrow.me.uk
0
Reply Ben 6/19/2004 2:24:19 PM

8 Replies
207 Views

(page loaded in 0.109 seconds)

Similiar Articles:













7/24/2012 5:53:26 PM


Reply: