I am looking for a regexp that matches the ANSI terminal escape sequences
(ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces
ANSI terminal sequences by an arbitrary chosen fixed replacement. Thanks.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 10:43:55 AM |
|
Janis Papanagnou wrote:
> I am looking for a regexp that matches the ANSI terminal escape sequences
> (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces
> ANSI terminal sequences by an arbitrary chosen fixed replacement. Thanks.
>
> Janis
Are these sequences "hardwired" into an application, or is the application
using curses? If the latter, you should be able to fudge a terminfo entry
to produce the required sequences. See terminfo(5).
Andrew
|
|
0
|
|
|
|
Reply
|
Andrew
|
5/25/2010 12:13:52 PM
|
|
Janis Papanagnou wrote:
> I am looking for a regexp that matches the ANSI terminal escape sequences
> (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces
> ANSI terminal sequences by an arbitrary chosen fixed replacement. Thanks.
I've never done that, but I suppose any regex flavor that can match the
escape character would do, so for example with GNU sed's ERE to match
coloring sequences:
\x1b\[[0-9]+;[0-9]+m
or something similar.
$ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
$ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
FOO - FOO
Apologies if I didn't understand correctly what you're after.
|
|
0
|
|
|
|
Reply
|
pk
|
5/25/2010 12:24:08 PM
|
|
Andrew McDermott wrote:
> Janis Papanagnou wrote:
>
>> I am looking for a regexp that matches the ANSI terminal escape sequences
>> (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces
>> ANSI terminal sequences by an arbitrary chosen fixed replacement. Thanks.
>
> Are these sequences "hardwired" into an application, or is the application
> using curses? If the latter, you should be able to fudge a terminfo entry
> to produce the required sequences. See terminfo(5).
I am telnet'ing to a server that emits those ANSI sequences in addition
to the data I am interested in. It's not specified what that server will
actually emit, therefore I am looking for a "universal" regexp for those
sequences. Probably something like \027[[]\([0-9]*;\)+[A-Za-z0-9] or so.
Since it's likely that I might make mistakes when defining this, and since
I believe that it's as well likely that someone else already invented that
wheel, I am asking.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 1:15:21 PM
|
|
pk wrote:
> Janis Papanagnou wrote:
>
>> I am looking for a regexp that matches the ANSI terminal escape sequences
>> (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces
>> ANSI terminal sequences by an arbitrary chosen fixed replacement. Thanks.
>
> I've never done that, but I suppose any regex flavor that can match the
> escape character would do, so for example with GNU sed's ERE to match
> coloring sequences:
>
> \x1b\[[0-9]+;[0-9]+m
>
> or something similar.
>
> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
> FOO - FOO
>
> Apologies if I didn't understand correctly what you're after.
Sorry for having been unclear.
I know that I just need some BRE/ERE tool, like sed, to substitute the
actual ANSI codes. I was interested in a regexp that covers all ANSI
sequences in one regexp expression because, actually, I don't know what
the telnet server will emit. (Please see also my response to Andrew.)
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 1:19:11 PM
|
|
Janis Papanagnou wrote:
> pk wrote:
>> Janis Papanagnou wrote:
>>
>>> I am looking for a regexp that matches the ANSI terminal escape
>>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux)
>>> that replaces ANSI terminal sequences by an arbitrary chosen fixed
>>> replacement. Thanks.
>>
>> I've never done that, but I suppose any regex flavor that can match the
>> escape character would do, so for example with GNU sed's ERE to match
>> coloring sequences:
>>
>> \x1b\[[0-9]+;[0-9]+m
>>
>> or something similar.
>>
>> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
>> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
>> FOO - FOO
>>
>> Apologies if I didn't understand correctly what you're after.
>
> Sorry for having been unclear.
>
> I know that I just need some BRE/ERE tool, like sed, to substitute the
> actual ANSI codes. I was interested in a regexp that covers all ANSI
> sequences in one regexp expression because, actually, I don't know what
> the telnet server will emit. (Please see also my response to Andrew.)
See if this expect tip helps:
http://wiki.tcl.tk/9673
|
|
0
|
|
|
|
Reply
|
pk
|
5/25/2010 1:32:54 PM
|
|
pk wrote:
> Janis Papanagnou wrote:
>
>> pk wrote:
>>> Janis Papanagnou wrote:
>>>
>>>> I am looking for a regexp that matches the ANSI terminal escape
>>>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux)
>>>> that replaces ANSI terminal sequences by an arbitrary chosen fixed
>>>> replacement. Thanks.
>>> I've never done that, but I suppose any regex flavor that can match the
>>> escape character would do, so for example with GNU sed's ERE to match
>>> coloring sequences:
>>>
>>> \x1b\[[0-9]+;[0-9]+m
>>>
>>> or something similar.
>>>
>>> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
>>> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
>>> FOO - FOO
>>>
>>> Apologies if I didn't understand correctly what you're after.
>> Sorry for having been unclear.
>>
>> I know that I just need some BRE/ERE tool, like sed, to substitute the
>> actual ANSI codes. I was interested in a regexp that covers all ANSI
>> sequences in one regexp expression because, actually, I don't know what
>> the telnet server will emit. (Please see also my response to Andrew.)
>
> See if this expect tip helps:
>
> http://wiki.tcl.tk/9673
Not sure. Quoting from the link (first example)...
regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match
It seems that ANSI sequences can terminate in a digit. How could one
distinguish in a sequence like, say, \x1b[0A whether the A is part of
the ANSI sequence or part of the subsequent data.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 2:39:29 PM
|
|
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> pk wrote:
<snip>
>> See if this expect tip helps:
>>
>> http://wiki.tcl.tk/9673
>
> Not sure. Quoting from the link (first example)...
>
> regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match
>
> It seems that ANSI sequences can terminate in a digit.
A quick scan of some online documents suggest that this is not so. All
the sequences I've see end in a letter. Wikipedia suggest the last byte
must be between ASCII @ and ~ inclusive.
If you are prepared to use a very general regexp that will strip out
ill-formed escape sequences you could start with
\x1b\[[^@-~]*[@-~]
You then need to catch the two-byte sequences:
\x1b\[[^@-~]*[@-~]|\x1b[@-~]
This will go wrong for those sequences that can include quoted strings
like those that set key mappings. Maybe you can ignore these.
There is also a one-byte alternative to \x1b[ which is \x9b so you might
want to try:
(\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]
--
Ben.
|
|
0
|
|
|
|
Reply
|
Ben
|
5/25/2010 4:43:48 PM
|
|
Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> pk wrote:
> <snip>
>>> See if this expect tip helps:
>>>
>>> http://wiki.tcl.tk/9673
>>
>> Not sure. Quoting from the link (first example)...
>>
>> regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match
>>
>> It seems that ANSI sequences can terminate in a digit.
>
> A quick scan of some online documents suggest that this is not so. All
> the sequences I've see end in a letter. Wikipedia suggest the last byte
> must be between ASCII @ and ~ inclusive.
>
> If you are prepared to use a very general regexp that will strip out
> ill-formed escape sequences you could start with
>
> \x1b\[[^@-~]*[@-~]
>
> You then need to catch the two-byte sequences:
>
> \x1b\[[^@-~]*[@-~]|\x1b[@-~]
>
> This will go wrong for those sequences that can include quoted strings
> like those that set key mappings. Maybe you can ignore these.
>
> There is also a one-byte alternative to \x1b[ which is \x9b so you might
> want to try:
>
> (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]
For reference, here are some tables with most ANSI escape sequences:
http://isthe.com/chongo/tech/comp/ansi_escapes.html
http://ascii-table.com/ansi-escape-sequences.php
|
|
0
|
|
|
|
Reply
|
pk
|
5/25/2010 4:49:38 PM
|
|
Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> pk wrote:
> <snip>
>>> See if this expect tip helps:
>>>
>>> http://wiki.tcl.tk/9673
>> Not sure. Quoting from the link (first example)...
>>
>> regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match
>>
>> It seems that ANSI sequences can terminate in a digit.
>
> A quick scan of some online documents suggest that this is not so. All
> the sequences I've see end in a letter. Wikipedia suggest the last byte
> must be between ASCII @ and ~ inclusive.
>
> If you are prepared to use a very general regexp that will strip out
> ill-formed escape sequences you could start with
>
> \x1b\[[^@-~]*[@-~]
>
> You then need to catch the two-byte sequences:
>
> \x1b\[[^@-~]*[@-~]|\x1b[@-~]
>
> This will go wrong for those sequences that can include quoted strings
> like those that set key mappings. Maybe you can ignore these.
Yes, I think I can ignore those.
>
> There is also a one-byte alternative to \x1b[ which is \x9b so you might
> want to try:
>
> (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]
>
Looks good, and seems to work. Thanks, Ben. Thanks also to Andrew and
pk.
Just an additional note for those who try that expression and observe
problems; setting LANG=C might fix some issues in non-C locales.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 5:52:57 PM
|
|
In article <htgnf1$s5d$1@news.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>It seems that ANSI sequences can terminate in a digit. How could one
>distinguish in a sequence like, say, \x1b[0A whether the A is part of
>the ANSI sequence or part of the subsequent data.
No, I don't think they can. The patterns I've used in the past for excising
ANSI sequences:
gsub(/\033\[[^a-zA-Z]*./, "")
gsub(/\033./, "")
Apparently the terminating character can actually be characters 64 through 95,
not just letters, though I haven't seen that.
And of course you may also encounter the single-character CSI, character 155,
in place of \033[.
John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
|
|
0
|
|
|
|
Reply
|
spcecdt
|
5/25/2010 6:41:56 PM
|
|
pk <pk@pk.invalid> writes:
<snip>
> For reference, here are some tables with most ANSI escape sequences:
>
> http://isthe.com/chongo/tech/comp/ansi_escapes.html
> http://ascii-table.com/ansi-escape-sequences.php
Yes, I found both of those but they seem less that comprehensive (my
test being if they tell you about \e[J and \e[1J as well as \e2J).
ECMA-48 seems to be the most definitive reference I can find online. It
gives a more restrictive pattern:
(\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
reserved for private or experimental use so this could be made even more
restricted.
--
Ben.
|
|
0
|
|
|
|
Reply
|
Ben
|
5/25/2010 9:15:57 PM
|
|
Ben Bacarisse wrote:
> pk <pk@pk.invalid> writes:
> <snip>
>> For reference, here are some tables with most ANSI escape sequences:
>>
>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>> http://ascii-table.com/ansi-escape-sequences.php
>
> Yes, I found both of those but they seem less that comprehensive (my
> test being if they tell you about \e[J and \e[1J as well as \e2J).
>
> ECMA-48 seems to be the most definitive reference I can find online. It
> gives a more restrictive pattern:
>
> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
I wonder, though, why, e.g.,
ESC ( B
ESC =
ESC >
(which, incidentally, are all in the data that I parse) are not covered
by the pattern that you've found in the ECMA-48 reference.
> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
> reserved for private or experimental use so this could be made even more
> restricted.
>
BTW, in one of the references there are also escape sequences that seems
to be terminated by a digit; ESC 7 and ESC 8, for example.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2010 10:25:24 PM
|
|
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> Ben Bacarisse wrote:
>> pk <pk@pk.invalid> writes:
>> <snip>
>>> For reference, here are some tables with most ANSI escape sequences:
>>>
>>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>>> http://ascii-table.com/ansi-escape-sequences.php
>>
>> Yes, I found both of those but they seem less that comprehensive (my
>> test being if they tell you about \e[J and \e[1J as well as \e2J).
>>
>> ECMA-48 seems to be the most definitive reference I can find online. It
>> gives a more restrictive pattern:
>>
>> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
>
> I wonder, though, why, e.g.,
>
> ESC ( B
> ESC =
> ESC >
>
> (which, incidentally, are all in the data that I parse) are not covered
> by the pattern that you've found in the ECMA-48 reference.
What I quoted was a pattern for what ECMA-48 calls control sequences.
There are four other categories (the C0 set, the C1 set, independent
control functions and control strings) and I have not gone through and
worked them all out. I think there is a lot of history being codified
here.
>> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
>> reserved for private or experimental use so this could be made even more
>> restricted.
>>
>
> BTW, in one of the references there are also escape sequences that seems
> to be terminated by a digit; ESC 7 and ESC 8, for example.
That may well be possible. I was only describing "control sequences" --
those that start with CSI (the Control Sequence Introducer) \e[.
There aught to be an ANSI document, of course, but they are not always
easily available. It might be easier to read though than ECMA-48 which
is rather hard going.
--
Ben.
|
|
0
|
|
|
|
Reply
|
Ben
|
5/25/2010 11:06:07 PM
|
|
Janis Papanagnou wrote:
> Ben Bacarisse wrote:
>> pk <pk@pk.invalid> writes:
>> <snip>
>>> For reference, here are some tables with most ANSI escape sequences:
>>>
>>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>>> http://ascii-table.com/ansi-escape-sequences.php
>>
>> Yes, I found both of those but they seem less that comprehensive (my
>> test being if they tell you about \e[J and \e[1J as well as \e2J).
>>
>> ECMA-48 seems to be the most definitive reference I can find online. It
>> gives a more restrictive pattern:
>>
>> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
>
> I wonder, though, why, e.g.,
>
> ESC ( B
> ESC =
> ESC >
I don't know of a handy online reference but I have an old copy of an
actual VT100 user guide with a pretty good description that seems
comprehensive. For example
ESC ( B is shown as ANSI SCS control which switches from G0 to G1
char set.
ESC = is shown as DECKPAM Keypad App Mode (DEC private)
ESC > is shown as DECKPNM Keypad Numeric Mode (DEC private)
> (which, incidentally, are all in the data that I parse) are not covered
> by the pattern that you've found in the ECMA-48 reference.
>
>> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
>> reserved for private or experimental use so this could be made even more
>> restricted.
>>
> BTW, in one of the references there are also escape sequences that seems
> to be terminated by a digit; ESC 7 and ESC 8, for example.
Ok, I'm back and it seems there is a copy at:
www.piesoftwareinc.co.uk/textonly/VT100_User_Guide.pdf
I don't know if it helps but it has a lot of pages :)
|
|
0
|
|
|
|
Reply
|
stan
|
5/26/2010 9:23:35 PM
|
|
|
14 Replies
1892 Views
(page loaded in 0.199 seconds)
Similiar Articles: ANSI terminal escape sequence regexp - comp.unix.shellI am looking for a regexp that matches the ANSI terminal escape sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces ... Matching certain unicode characters with REGEXP - comp.databases ...ANSI terminal escape sequence regexp - comp.unix.shell Matching certain unicode characters with REGEXP - comp.databases ... SAY command on z/OS - Routine not found - comp ... Video Test Sequences - comp.compressionANSI terminal escape sequence regexp - comp.unix.shell... regexp that matches the ANSI terminal escape sequences ... but they seem less that comprehensive (my >> test ... find sequences of 1's - comp.soft-sys.matlabANSI terminal escape sequence regexp - comp.unix.shell I am looking for a regexp that matches the ANSI terminal escape sequences (ESC [ ...) (for xterm), or alternatively ... how do i reset the terminal - comp.unix.solarisANSI terminal escape sequence regexp - comp.unix.shell how do i reset the terminal - comp.unix.solaris ANSI terminal escape sequence regexp - comp.unix.shell how do i ... Regex to match a numerical IP range - comp.lang.perl.misc ...ANSI terminal escape sequence regexp - comp.unix.shell I've never done that, but I suppose any regex flavor that can match ... x7e] In fact, trailing bytes in the range ... how to clear screen in unix - fn. similar to clrscr() in DOS ...ANSI terminal escape sequence regexp - comp.unix.shell how to clear screen in unix - fn. similar to clrscr() in DOS ... ANSI terminal escape sequence regexp - comp.unix ... ANSI 256 characters and codes, how to - comp.lang.awkANSI terminal escape sequence regexp - comp.unix.shell... done that, but I suppose any regex flavor that can match the escape character ... that I just need some BRE/ERE ... Test regex in KSH - comp.unix.shellANSI terminal escape sequence regexp - comp.unix.shell All the sequences I've see end in a letter. Wikipedia ... Test regex in KSH - comp.unix.shell ANSI terminal escape ... WordStar or CP/M patches for VT100 terminal - comp.os.cpm ...ANSI terminal escape sequence regexp - comp.unix.shell WordStar or CP/M patches for VT100 terminal - comp.os.cpm ... ANSI terminal escape sequence regexp - comp.unix.shell ... perl + regex bug? - comp.lang.perl.miscANSI terminal escape sequence regexp - comp.unix.shell Replace Unicode code value U+0900 - comp.lang.awk ANSI terminal escape sequence regexp - comp.unix.shell perl ... ksh or sed search & replace - comp.unix.programmerANSI terminal escape sequence regexp - comp.unix.shell... escape character would do, so for example with GNU sed ... Test regex in KSH - comp.unix.shell ANSI terminal ... How to control the cursor positon of java program in console mode ...ANSI terminal escape sequence regexp - comp.unix.shell ESC = is shown as DECKPAM Keypad App Mode (DEC ... Replace Unicode code value U+0900 - comp.lang.awk ANSI ... Future Terminal Emulator? - comp.unix.solarisANSI terminal escape sequence regexp - comp.unix.shell Future Terminal Emulator? - comp.unix.solaris ANSI terminal escape sequence regexp - comp.unix.shell Automatic ... SAY command on z/OS - Routine not found - comp.lang.rexx ...REXX on UNIX - comp.lang.rexx SAY command on z/OS - Routine not found - comp.lang.rexx ... ANSI terminal escape sequence regexp - comp.unix.shell SAY command on z/OS ... ANSI terminal escape sequence regexp - comp.unix.shell | Computer ...I am looking for a regexp that matches the ANSI terminal escape sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux) that replaces ... Convert a transcript with Ansi escape sequences to HTML « Python ...... transcript generated by the Unix script command that uses ANSI escape sequences, used to colour the terminal ... regexp = "(?: %s)(.*?)(?: %s)" % (colorcodes [color][True], colorcodes ... 7/19/2012 5:30:11 PM
|