Hi,
while writing a longer awk script, I stumbled across the following problem:
I get a field, let it be $8 for example, containing a string of unknown
length, which itself contains various amounts of tabs and spaces,
leading, trailing and between the "words" of that string.
I need to coalesce each combination of multiple whitespaces to a single
space, e.g.
test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
should give
test<space>testing<space>testing<space>word
Any good ideas? I don't need a finished solution, just a good hint for
starting point to get a grip on that problem.
Best regards
.... Ralph ...
|
|
0
|
|
|
|
Reply
|
Ralph
|
11/22/2003 1:39:49 PM |
|
Ralph Graulich wrote:
> Hi,
>
> while writing a longer awk script, I stumbled across the following problem:
>
> I get a field, let it be $8 for example, containing a string of unknown
> length, which itself contains various amounts of tabs and spaces,
> leading, trailing and between the "words" of that string.
>
> I need to coalesce each combination of multiple whitespaces to a single
> space, e.g.
>
> test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
>
> should give
>
> test<space>testing<space>testing<space>word
>
> Any good ideas? I don't need a finished solution, just a good hint for
> starting point to get a grip on that problem.
>
I was surprised to find that this works:
echo "hi \t \t\t there" |
awk '{split($1,x," ");print x[2]}' FS=":"
I thought you'd have to use the "[:space:]" RE (but that didn't work for
me) or store the value of FS in a variable before re-assigning it and
use that variable as the third argument for split, e.g.:
echo "hi \t \t\t there" |
awk 'BEGIN{fs=FS; FS=":"}{c=split($1,x,fs);print x[2]}'
Regards,
Ed.
~
> Best regards
> ... Ralph ...
>
|
|
0
|
|
|
|
Reply
|
Ed
|
11/22/2003 3:22:31 PM
|
|
In article <3FBF6725.2090106@shauny.de>,
Ralph Graulich <maillist@shauny.de> wrote:
X Hi,
X
X while writing a longer awk script, I stumbled across the following problem:
X
X I get a field, let it be $8 for example, containing a string of unknown
X length, which itself contains various amounts of tabs and spaces,
X leading, trailing and between the "words" of that string.
X
X I need to coalesce each combination of multiple whitespaces to a single
X space, e.g.
X
X test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
X
X should give
X
X test<space>testing<space>testing<space>word
X
X Any good ideas? I don't need a finished solution, just a good hint for
X starting point to get a grip on that problem.
X
X
X Best regards
X ... Ralph ...
X
awk '
{
gsub(/[ \t]+/," ",$8) # multiple space/tab to 1 space
sub(/^ /,"",$8) # remove leading space
sub(/ $/,"",$8) # remove trailing space
print
}
'
or you could try
awk '
{
gsub(/[ \t]+/," ",$8) # multiple space/tab to 1 space
gsub(/(^ )|( $)/,"",$8) # remove leading/trailing space
print
}
'
Bob Harris
|
|
0
|
|
|
|
Reply
|
Bob
|
11/22/2003 3:35:46 PM
|
|
Hi Bob, hi Ed,
many thanks for your immediate answer. Combining the ideas of both of
you (gsub and posix character classes) helped me solving my specific
problem with some variations.
One of the main things I didn't catch at first, was that "gsub" returns
the number of matches, not the string with substitutions applied itself.
Wrote many awk scripts up to now, but never actually had any need for
gsub - learnt something new today!
Best regards
.... Ralph ...
|
|
0
|
|
|
|
Reply
|
Ralph
|
11/22/2003 4:38:04 PM
|
|
On Sat, 22 Nov 2003 14:39:49 +0100, Ralph Graulich
<maillist@shauny.de> wrote:
>Hi,
>
>while writing a longer awk script, I stumbled across the following problem:
>
>I get a field, let it be $8 for example, containing a string of unknown
>length, which itself contains various amounts of tabs and spaces,
>leading, trailing and between the "words" of that string.
>
>I need to coalesce each combination of multiple whitespaces to a single
>space, e.g.
>
>test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
>
>should give
>
>test<space>testing<space>testing<space>word
>
>Any good ideas? I don't need a finished solution, just a good hint for
>starting point to get a grip on that problem.
>
Add a null ("") to one of the fields, then print $0. Acting on a
field, even if nothing is actually changed, causes $0 to be
reconstructed using the default output separator - a single space.
T.E.D. (tdavis@gearbox.maem.umr.edu - e-mail must contain "T.E.D." or my .sig in the body)
|
|
0
|
|
|
|
Reply
|
Ted
|
11/22/2003 6:07:49 PM
|
|
In article <3FBF6725.2090106@shauny.de>,
Ralph Graulich <maillist@shauny.de> wrote:
>Hi,
>
>while writing a longer awk script, I stumbled across the following problem:
>
>I get a field, let it be $8 for example, containing a string of unknown
>length, which itself contains various amounts of tabs and spaces,
>leading, trailing and between the "words" of that string.
>
>I need to coalesce each combination of multiple whitespaces to a single
>space, e.g.
>
>test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
>
>should give
>
>test<space>testing<space>testing<space>word
>
>Any good ideas? I don't need a finished solution, just a good hint for
>starting point to get a grip on that problem.
awk '$1=$1' infile
Chuck Demas
--
Eat Healthy | _ _ | Nothing would be done at all,
Stay Fit | @ @ | If a man waited to do it so well,
Die Anyway | v | That no one could find fault with it.
demas@theworld.com | \___/ | http://world.std.com/~cpd
|
|
0
|
|
|
|
Reply
|
demas
|
11/22/2003 9:03:59 PM
|
|
|
5 Replies
153 Views
(page loaded in 0.077 seconds)
|