Attn: GAWK developers! Here's a nice thing to add to GAWK ([g]subs).

A nice thing that is in TAWK that would be nice to have in GAWK is the
subs() function.  This works just like sub(), except that the first arg is
not treated as a regular expression.  This is useful when you want to do a
straight string replacement, without worrying about reg exp "magic"
characters appearing in the "target" string.  Effectively, subs() is the
same as sub(), but with every "magic" character backslash-escaped.

The need for this is not very common, but when it comes up, it can bite
you.  I had this happen to me recently in a script I was writing - where
there happened to be square brackets ([]) in the string I was trying to
replace.  And yes, we all know that passing an arbitrary string as the
first arg to sub() is not safe, but we've all done it from time to time.

As a side comment, one could well say that whenever you see anything other
than an RE constant (or a clearly well constructed string) as the first arg
to sub() (or gsub()), it look suspicious - in much the same way that in C
code, seeing anything other than a string constant as the first arg to
print() (or any of the printf() family) looks suspicious - like a bug
waiting to happen.  The point is that, in both cases, you know that the
programmer is just relying on luck to ensure that there aren't any "magic"
characters (% signs in the case of printf) in the arbitrary string they are
passing to the function.

Anyway, that all said, it seems to me that this needs to be a builtin,
because there's no way, in either user-space (i.e., a function written in
AWK) or in extension-library-space (i.e., a function written in C) to
define a GAWK function that modifies one of its parameters (exception: If
the passed arg is an array).  To do so, requires special support in the
language itself; to the best of my knowledge, sub() and gsub() are the only
functions in the AWK language that do this.  Furthermore, it has been made
clear many times in this newsgroup that it won't do any good for me to
write such functions for the core GAWK interpreter, as any changes that I
make will not propagated into the core language.  I may yet do so, for my
own entertainment, but in order for it to make it into the core language,
one of you guys (the official developers) will have to take this ball and
run with it.

Finally, here is an implementation in user-space, that shows what I'm
talking about.  Note that it does work, but it doesn't mirror the
sub()/gsub() semantics completely, because the target string is not
modified by the call; like gensub(), the new value is returned as the
value of the function call.  To my mind, this is less good than the
sub()/gsub() model, since sub() and gsub() are free to return a count -
which is usually quite useful.  Having to return the modified string as the
function return value prevents the return of the count.  This is not good.

# A helper function
function insert(str,spos,len,newstr) {
    return substr(str,1,spos-1) newstr substr(str,spos+len)
# subs is like sub, except that no regular expression handling is done
function subs(s,r,str,	t) {
    if (str == "") str = $0
    return (t = index(str,s)) ? insert(str,t,length(s),r) : str

Marshall: 10/22/51
Jessica: 4/4/79
12/24/2016 5:42:38 PM
comp.lang.awk 3450 articles. 0 followers. Post Follow

1 Replies

Similar Articles

[PageSpeed] 1

On Saturday, December 24, 2016 at 12:42:39 PM UTC-5, Kenny McCormack wrote:
> The need for this is not very common

Yes needed this before .. there are ways around it, but it's convenience and peace of mind dealing with messy data and literal strings.

I made a less elegant non-regex version of sub(), but in that case I had to be careful to avoid the substitution if there was more than 1 match, so first did a non-regex occurrences count: 


I guess this is the problem you're talking about it; if subs() returned the count it could provide the same ability in one function: count and substitution 

12/25/2016 4:28:02 AM