coalescing multiple whitespace to single space

  • Follow


Hi,

while writing a longer awk script, I stumbled across the following problem:

I get a field, let it be $8 for example, containing a string of unknown 
length, which itself contains various amounts of tabs and spaces, 
leading, trailing and between the "words" of that string.

I need to coalesce each combination of multiple whitespaces to a single 
space, e.g.

test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>

should give

test<space>testing<space>testing<space>word

Any good ideas? I don't need a finished solution, just a good hint for 
starting point to get a grip on that problem.


Best regards
.... Ralph ...

0
Reply Ralph 11/22/2003 1:39:49 PM


Ralph Graulich wrote:
> Hi,
> 
> while writing a longer awk script, I stumbled across the following problem:
> 
> I get a field, let it be $8 for example, containing a string of unknown 
> length, which itself contains various amounts of tabs and spaces, 
> leading, trailing and between the "words" of that string.
> 
> I need to coalesce each combination of multiple whitespaces to a single 
> space, e.g.
> 
> test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
> 
> should give
> 
> test<space>testing<space>testing<space>word
> 
> Any good ideas? I don't need a finished solution, just a good hint for 
> starting point to get a grip on that problem.
> 

I was surprised to find that this works:

echo "hi \t  \t\t there" |
     awk '{split($1,x," ");print x[2]}' FS=":"

I thought you'd have to use the "[:space:]" RE (but that didn't work for 
me) or store the value of FS in a variable before re-assigning it and 
use that variable as the third argument for split, e.g.:

echo "hi \t  \t\t there" |
     awk 'BEGIN{fs=FS; FS=":"}{c=split($1,x,fs);print x[2]}'

Regards,

	Ed.


~
> Best regards
> ... Ralph ...
> 

0
Reply Ed 11/22/2003 3:22:31 PM


In article <3FBF6725.2090106@shauny.de>,
 Ralph Graulich <maillist@shauny.de> wrote:

X Hi,
X 
X while writing a longer awk script, I stumbled across the following problem:
X 
X I get a field, let it be $8 for example, containing a string of unknown 
X length, which itself contains various amounts of tabs and spaces, 
X leading, trailing and between the "words" of that string.
X 
X I need to coalesce each combination of multiple whitespaces to a single 
X space, e.g.
X 
X test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
X 
X should give
X 
X test<space>testing<space>testing<space>word
X 
X Any good ideas? I don't need a finished solution, just a good hint for 
X starting point to get a grip on that problem.
X 
X 
X Best regards
X ... Ralph ...
X 

awk '
    {
        gsub(/[ \t]+/," ",$8)   # multiple space/tab to 1 space
        sub(/^ /,"",$8)         # remove leading space
        sub(/ $/,"",$8)         # remove trailing space
        print
    }
'

or you could try

awk '
    {
        gsub(/[ \t]+/," ",$8)   # multiple space/tab to 1 space
        gsub(/(^ )|( $)/,"",$8) # remove leading/trailing space
        print
    }
'

                                        Bob Harris
0
Reply Bob 11/22/2003 3:35:46 PM

Hi Bob, hi Ed,

many thanks for your immediate answer. Combining the ideas of both of 
you (gsub and posix character classes) helped me solving my specific 
problem with some variations.

One of the main things I didn't catch at first, was that "gsub" returns 
the number of matches, not the string with substitutions applied itself. 
Wrote many awk scripts up to now, but never actually had any need for 
gsub - learnt something new today!

Best regards
.... Ralph ...

0
Reply Ralph 11/22/2003 4:38:04 PM

On Sat, 22 Nov 2003 14:39:49 +0100, Ralph Graulich
<maillist@shauny.de> wrote:

>Hi,
>
>while writing a longer awk script, I stumbled across the following problem:
>
>I get a field, let it be $8 for example, containing a string of unknown 
>length, which itself contains various amounts of tabs and spaces, 
>leading, trailing and between the "words" of that string.
>
>I need to coalesce each combination of multiple whitespaces to a single 
>space, e.g.
>
>test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
>
>should give
>
>test<space>testing<space>testing<space>word
>
>Any good ideas? I don't need a finished solution, just a good hint for 
>starting point to get a grip on that problem.
>

Add a null ("") to one of the fields, then print $0. Acting on a
field, even if nothing is actually changed, causes $0 to be
reconstructed using the default output separator - a single space.


T.E.D. (tdavis@gearbox.maem.umr.edu - e-mail must contain "T.E.D." or my .sig in the body)
0
Reply Ted 11/22/2003 6:07:49 PM

In article <3FBF6725.2090106@shauny.de>,
Ralph Graulich  <maillist@shauny.de> wrote:
>Hi,
>
>while writing a longer awk script, I stumbled across the following problem:
>
>I get a field, let it be $8 for example, containing a string of unknown 
>length, which itself contains various amounts of tabs and spaces, 
>leading, trailing and between the "words" of that string.
>
>I need to coalesce each combination of multiple whitespaces to a single 
>space, e.g.
>
>test<space><tab><tab>testing<tab>testing<space><space><space>word<tab>
>
>should give
>
>test<space>testing<space>testing<space>word
>
>Any good ideas? I don't need a finished solution, just a good hint for 
>starting point to get a grip on that problem.

awk '$1=$1' infile 


Chuck Demas

-- 
  Eat Healthy        |   _ _   | Nothing would be done at all,
  Stay Fit           |   @ @   | If a man waited to do it so well,
  Die Anyway         |    v    | That no one could find fault with it.
  demas@theworld.com |  \___/  | http://world.std.com/~cpd
0
Reply demas 11/22/2003 9:03:59 PM

5 Replies
153 Views

(page loaded in 0.077 seconds)

Similiar Articles:













7/16/2012 2:45:04 AM


Reply: