Experimental gawk version, *with debugger*, now available

  • Follow


	awk 'BEGIN { print "hello, word" }'

I am pleased to announce the availability of a test version of gawk.
This version uses a byte-code execution engine, and most importantly,
it includes a debugger that works at the level of awk statements! The
distribution is available at

	http://www.skeeve.com/gawk/gawk-3.1.7-bc-d.tar.gz

This version is the same as 3.1.7, but with a new execution engine and
a debugging version of gawk named, rather imaginatively, "dgawk".

There is a story here.	Circa 2003, a gentleman by the name of Jon Haque
developed the byte-code execution engine and debugger, in the context
of a development gawk version, somewhere between 3.1.3 and 3.1.4.

I never integrated the changes as they were massive and I was busy,
and I wasn't able to review them.

The changes languished, and Jon disappeared.

Last fall, Stephen Davies, one of my portability team members, agreed to
take on the task of bringing the code into the present.  With modest help
from me, he succeeded.	We then went through additional work to get this
version portable to some of the more esoteric systems that gawk supports
(64 bit Linux, z/OS and VMS).

I thought it was ready for release at the end of December, until another
one of my testers found a severe memory leak in the byte code version.
It was a bear to track down, and once again Stephen came through.

The debugger uses the readline library, and it is purposely similar
to GDB.  There is only minimal documentation on the debugger; I'd love
to have someone volunteer to write a chapter for the gawk manual that
explains it fully.

In addition, this version supports a file-inclusion mechanism, but it
is quite likely to change once I review the awk.info poll results and
make a final decision (within a week, I hope!).

Per the gawk roadmap I announced a while back, the plan is as follows:

1. Make a 3.1.8 release based on the original execution engine so that
   OS distributions have a stable version to ship.

2. Merge the 3.1.8 changes into the 'gawk-devel' tree.

3. Merge the byte-code changes into 3.1.8 to give us an up-to-date
   set of byte-code changes.

4. Merge the byte-code changes into the devel tree. Development will
   then continue in the devel tree towards a 4.0 release.

PLEASE try this out on your programs.  Please also play with the
debugger.

If you think this is a major step forward, send Stephen a note thanking
him - his email address is in the dist. (I don't want to put it into
the news article so he won't get spam.)

If you are interested in contributing, please contact me directly, not
by posting in the news group.

Thanks!

Arnold
-- 
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 2/4/2010 6:45:16 AM

On Feb 4, 1:45=A0am, arn...@skeeve.com (Aharon Robbins) wrote:
> =A0 =A0 =A0 =A0 awk 'BEGIN { print "hello, word" }'
>
> I am pleased to announce the availability of a test version of gawk.
> This version uses a byte-code execution engine, and most importantly,
> it includes a debugger that works at the level of awk statements!

great news. very exciting!

any comments on the performance hit/improvement of running as byte
codes?

t
0
Reply Tim 2/4/2010 1:32:58 PM


Aharon Robbins <arnold@skeeve.com> wrote:
: [...]

: PLEASE try this out on your programs.  Please also play with the
: debugger.

: [...]

Hmmm...

so I built gawk-3.1.7-bc-d on FreeBSD6.0/i386, and after two abortive
tries, both of which had me quite close to sending error reports (would
it hurt to make --enable-switch the default by now?), it seems to work.
Nicely done.

Since the code I tested this on with some frustration turned out to be
fine in the end, I thought I'd share:

-----isbn.awk
# isbn.awk -- ISBN checksum and conversion functions
#
# Sebastian F. Mix, charon@furry.de, Public Domain
# November 2009

BEGIN{
	isbn_errbase=10
}

#strip unnecessary syntactic sugar and whitespace before computation
function isbn_sanitize(s){
	gsub("-","",s);gsub(" ","",s);gsub("\t","",s);gsub("x","X",s)
	return s
}

#compute ISBN-10 checksum
#/^[0-9]{9}/ --> /^[0-9X]$/
function isbn_chksum10(s,	i, cs, wgt){
	wgt=10
	for(i=1;i<10;i++)
		cs+=strtonum(substr(s,i,1))*wgt--
	cs=11-cs%11
	return (cs==10)?"X":((cs==11)?"0":cs "")
}

#compute ISBN-13 checksum
#/^[0-9]{12}/ --> /^[0-9X]$/
function isbn_chksum13(s,	i, cs){
	for(i=1;i<13;i++)
		cs+=strtonum(substr(s,i,1))*(i%2?1:3)
	cs=10-cs%10
	return (cs==10)?"X":cs ""
}

#convert an ISBN13 with 978 prefix to an ISBN10
#preserve original formatting (via "-")
function isbn_13to10(s,		origs){
	if((s=isbn_sanitize(origs=s)) !~ /^[0-9]+[0-9X]$/){
		print "Junk characters in ISBN." > "/dev/stderr"
		exit(isbn_errbase+0)
	}
	if(length(s)!=13){
		print "The ISBN doesn't contain 13 characters." > "/dev/stderr"
		exit(isbn_errbase+1)
	}
	if(s !~ /^978/){
		print "Only a 978-prefix ISBN-13 can be converted to ISBN10." > "/dev/stderr"
		exit(isbn_errbase+2)
	}
	if(!isbn_check(s)){
		print "Invalid checksum." > "/dev/stderr"
		exit(isbn_errbase+3)
	}
	return substr(origs,4+(origs ~ /^978-/),length(origs)-4-(origs ~ /-.$/)) ((origs ~ /-.$/)?"":"-") isbn_chksum10(substr(s,4,9))
}

#convert an ISBN10 to a 978-prefix ISBN13
#preserve original formatting (via "-")
function isbn_10to13(s,		origs){
	if((s=isbn_sanitize(origs=s)) !~ /^[0-9]+[0-9X]$/){
		print "Junk characters in ISBN." > "/dev/stderr"
		exit(isbn_errbase+0)
	}
	if(length(s)!=10){
		print "The ISBN doesn't contain 10 characters." > "/dev/stderr"
		exit(isbn_errbase+1)
	}
	if(!isbn_check(s)){
		print "Invalid checksum." > "/dev/stderr"
		exit(isbn_errbase+3)
	}
	return "978-" substr(origs,1,length(origs)-1) ((origs ~ /-.$/)?"":"-") isbn_chksum13("978" substr(s,1,9))
}

#what kind of ISBN is this?
#/^[0-9Xx -]{9,?}/ --> 10,13,0(error)
function isbn_id(s){
	if((s=isbn_sanitize(s)) !~ /^[0-9]+[0-9X]$/)
		return 0
	switch(length(s)){
		case 10: return 10
		case 13: return 13
		default: return 0
	}
}

#check an ISBN(10 or 13) for validity. whitespace and "-" get ignored.
#/^[0-9Xx -]{13,?}/ --> |B
function isbn_check(s){
	if((s=isbn_sanitize(s)) !~ /^[0-9]+[0-9X]$/)
		return 0
	switch(length(s)){
		case 10: return isbn_chksum10(s)==substr(s,10,1)
		case 13: return isbn_chksum13(s)==substr(s,13,1)
		default: return 0
	}
}
-----isbn.awk

and

-----isbnconv.awk
#!/usr/home/charon/usr/local/bin/gawk-3.1.7-bc-d -f
@sourcefile "isbn.awk"
BEGIN{	print "Just enter either isbn10 or isbn13, as many as you want to,"
	print "one per line. End on EOF, end, exit or quit."
}
/^#/{next}
/^[QqEe][UuNnXx][IiDd]/{exit(0)}
isbn_id($0)==10{print isbn_10to13($0);next}
isbn_id($0)==13{print isbn_13to10($0);next}
{print "Not a valid ISBN." > "/dev/stderr"}
-----isbnconv.awk

Example:

-----
charon@achernar:~/src/awklib> ./isbnconv.awk                                    Just enter either isbn10 or isbn13, as many as you want to,
one per line. End on EOF, end, exit or quit.
123456789x
978-123456789-7
978-1-59582-204-8
1-59582-204-6
quit
-----

HTH

- --------------------------------chelImQo'----------------------------------- -
Sebastian F. Mix, Irenenstrasse 21a, D-10317 Berlin, Tel: ++4930 521 1034   /(a\
charon@cs.tu-berlin.de <-no NeXTmail GCode3.12 GCS/S d?- s+:- a E--- C+(+)  \p)/
USX+ P- L- W++ N+++ w--- M- !V PS+++ Y+ PGP+ 5+ X++ R-- b++(+) e+ h+ r-- y*
0
Reply charon 2/4/2010 1:51:07 PM

In article <3d0b51b1-6c7e-4128-8817-48d7bdc198dc@m4g2000vbn.googlegroups.com>,
Tim Menzies  <menzies.tim@gmail.com> wrote:
>On Feb 4, 1:45�am, arn...@skeeve.com (Aharon Robbins) wrote:
>> � � � � awk 'BEGIN { print "hello, word" }'
>>
>> I am pleased to announce the availability of a test version of gawk.
>> This version uses a byte-code execution engine, and most importantly,
>> it includes a debugger that works at the level of awk statements!
>
>great news. very exciting!
>
>any comments on the performance hit/improvement of running as byte
>codes?

No comments - some things seem faster, some slower. It seems to be a wash
overall, performance wise. But the gain of a debugger makes it worthwhile.
-- 
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 2/4/2010 3:56:03 PM

In article <7t01mbFlfaU1@mid.dfncis.de>,  <charon@cs.tu-berlin.de> wrote:
>(would it hurt to make --enable-switch the default by now?)

It will be the default for the next major version.

But it should work in the experimental version if you enable it.

Thanks,

Arnold
-- 
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 2/4/2010 3:57:18 PM

In article <hkeqmi$ahl$1@news.bytemine.net>,
Aharon Robbins <arnold@skeeve.com> wrote:
....
>>great news. very exciting!
>>
>>any comments on the performance hit/improvement of running as byte
>>codes?
>
>No comments - some things seem faster, some slower. It seems to be a wash
>overall, performance wise. But the gain of a debugger makes it worthwhile.

For the benefit of those of us in the cheap seats, could you say a bit
about what this "byte code interpreter" is?  I guess most of us just
assumed it was a speedup (more efficient way of doing things), but this
seems not to be the case.  What you say above implies that the BCI is a
necessary change in order for the debugger concept to work.  Is that
correct?

0
Reply gazelle 2/4/2010 4:08:45 PM

In article <hkered$87o$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>In article <hkeqmi$ahl$1@news.bytemine.net>,
>Aharon Robbins <arnold@skeeve.com> wrote:
>...
>>>great news. very exciting!
>>>
>>>any comments on the performance hit/improvement of running as byte
>>>codes?
>>
>>No comments - some things seem faster, some slower. It seems to be a wash
>>overall, performance wise. But the gain of a debugger makes it worthwhile.
>
>For the benefit of those of us in the cheap seats, could you say a bit
>about what this "byte code interpreter" is?  I guess most of us just
>assumed it was a speedup (more efficient way of doing things), but this
>seems not to be the case.  What you say above implies that the BCI is a
>necessary change in order for the debugger concept to work.  Is that
>correct?

Very good questions.

Production gawk builds a parse tree at parse time and then recursively
evaluates it to execute the program.  This is fine, but there are lots
of function calls.

The byte code version generates a linked list of "instructions" that
is iterated over in a loop for execution.  Byte code interpreters in
general can be faster than recursive evaluaters when done really well;
this is the general case for mawk. (Not surprising, either; Mike Brennan
is a brilliant programmer.)

The person who did the byte code engine at the same time integrated the
hooks and wrote a parser and executer for an awk-level debugger.

There is no theoretical requirement to have a byte code engine in order
to provide a debugger, it is just that all the code came together as
a package.

We have not investigated performance in the byte code version *at all*.
All our efforts up to this point have been focused on getting the code
integrated into the current code base such that it passes the test suite
and ports to all the systems regular gawk ports to.

Two significant bugs have been reported to me in the past 24 hours where
gawk-bc differs from regular gawk.  I'm working on them.

Thanks,

Arnold
-- 
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 2/5/2010 9:20:37 AM

6 Replies
178 Views

(page loaded in 0.104 seconds)

Similiar Articles:













7/9/2012 7:05:02 PM


Reply: