AWK internal design

  • Follow


I think that AWK, being one of the fastest (if not THE fastest) generic
text-processing tools on earth, gives an example of how good
engineering plus careful programming and profiling can produce
outstanding results.

I am interested in efficient text processing (gigabytes of text),
mostly indexing and matching.

I was wondering if I can learn the efficient processing techniques from
the awk experience.  It is possible (but hard) to do it from the source
code, but I was wondering if there are any documents (articles, blogs,
group posts) which explain how AWK is designed internally.

Do you know of any such documents?

Thanks.

Denis

1
Reply denis.mikhalkin (1) 6/16/2006 5:07:10 AM

denis.mikhalkin@csiro.au wrote:

> I think that AWK, being one of the fastest (if not THE fastest) generic
> text-processing tools on earth, gives an example of how good
> engineering plus careful programming and profiling can produce
> outstanding results.

There are many different implementations and
these implementation differ significantly in
their efficiency.
 
> I was wondering if I can learn the efficient processing techniques from
> the awk experience.  It is possible (but hard) to do it from the source
> code, but I was wondering if there are any documents (articles, blogs,
> group posts) which explain how AWK is designed internally.
> 
> Do you know of any such documents?

The only doc I know is Arnold Robbins' GAWK manual.
But the chapter on internals is rather short:

  http://www.gnu.org/software/gawk/manual/html_node/Internals.html#Internals

There are two projects trying to re-implement AWK's
execution model in Java. Maybe you can learn more
from their source code documentation:

  http://jawk.sourceforge.net/
  http://jakarta.apache.org/oro/api/org/apache/oro/text/awk/AwkCompiler.html
0
Reply ISO 6/16/2006 6:47:09 PM


J?rgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
> denis.mikhalkin@csiro.au wrote:
> 
>> I think that AWK, being one of the fastest (if not THE fastest) generic
>> text-processing tools on earth, gives an example of how good
>> engineering plus careful programming and profiling can produce
>> outstanding results.
> 
> There are many different implementations and
> these implementation differ significantly in
> their efficiency.

What is the fastest version of awk extant?

Thanks,
Dave Feustel
-- 
Using OpenBSD with or without X & KDE?
See Dave's OpenBSD | X | KDE corner at
http://dfeustel.home.mindspring.com !!!
0
Reply dfeustel 6/16/2006 8:32:32 PM

dfeustel@mindspring.com wrote:
> J?rgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
> > denis.mikhalkin@csiro.au wrote:
> >
> >> I think that AWK, being one of the fastest (if not THE fastest) generic
> >> text-processing tools on earth, gives an example of how good
> >> engineering plus careful programming and profiling can produce
> >> outstanding results.
> >
> > There are many different implementations and
> > these implementation differ significantly in
> > their efficiency.
>
> What is the fastest version of awk extant?
>
> Thanks,
> Dave Feustel
> --
> Using OpenBSD with or without X & KDE?
> See Dave's OpenBSD | X | KDE corner at
> http://dfeustel.home.mindspring.com !!!

Besides tawk (which is not free), mawk is the fastest.

0
Reply William 6/16/2006 8:51:26 PM

William James <w_a_x_man@yahoo.com> wrote:
> 
> dfeustel@mindspring.com wrote:
>> J?rgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>> > denis.mikhalkin@csiro.au wrote:
>> >
>> >> I think that AWK, being one of the fastest (if not THE fastest) generic
>> >> text-processing tools on earth, gives an example of how good
>> >> engineering plus careful programming and profiling can produce
>> >> outstanding results.
>> >
>> > There are many different implementations and
>> > these implementation differ significantly in
>> > their efficiency.
>>
>> What is the fastest version of awk extant?
>>
>> Thanks,
>> Dave Feustel
>> --
>> Using OpenBSD with or without X & KDE?
>> See Dave's OpenBSD | X | KDE corner at
>> http://dfeustel.home.mindspring.com !!!
> 
> Besides tawk (which is not free), mawk is the fastest.
> 
Mawk is on my system. Thanks! 
-- 
Using OpenBSD with or without X & KDE?
See Dave's OpenBSD | X | KDE corner at
http://dfeustel.home.mindspring.com !!!
0
Reply dfeustel 6/16/2006 10:27:15 PM

J=FCrgen Kahrs wrote:
> denis.mikhalkin@csiro.au wrote:
>
> > I think that AWK, being one of the fastest (if not THE fastest) generic
> > text-processing tools on earth, gives an example of how good
> > engineering plus careful programming and profiling can produce
> > outstanding results.
>
> There are many different implementations and
> these implementation differ significantly in
> their efficiency.
>
> > I was wondering if I can learn the efficient processing techniques from
> > the awk experience.  It is possible (but hard) to do it from the source
> > code, but I was wondering if there are any documents (articles, blogs,
> > group posts) which explain how AWK is designed internally.
> >
> > Do you know of any such documents?
>
> The only doc I know is Arnold Robbins' GAWK manual.
> But the chapter on internals is rather short:
>
>   http://www.gnu.org/software/gawk/manual/html_node/Internals.html#Intern=
als
>
> There are two projects trying to re-implement AWK's
> execution model in Java. Maybe you can learn more
> from their source code documentation:
>
>   http://jawk.sourceforge.net/
>   http://jakarta.apache.org/oro/api/org/apache/oro/text/awk/AwkCompiler.h=
tml

Alright, thank you for the pointers.

Denis

0
Reply denis 6/19/2006 10:00:27 AM

5 Replies
126 Views

(page loaded in 0.058 seconds)

Similiar Articles:













7/4/2012 8:29:14 PM


Reply: