Using regex's in Awk across linebreaks...

  • Follow


Hello All,

I;m a newbie to Awk, though I am somewhat familiar with regular
expressions.  Would like to use Awk to extract data from a large text
file.  Sometimes the data and the flag for the data are on different
lines, such as the case below (it would look better in a monospaced
font):

NUMBER OF SUBSURFACES

         EXTERIOR           INTERIOR
TOTAL    WINDOWS    DOORS    WINDOWS

    6          6        0          0

The data sequence shown above might occur hundreds of times in a
particular file, representing the physical space definition of a the
building.  So to count the number of doors in a particular building, I
'simply' need to accumulate values from the third parameter on the
second line following the string "TOTAL    WINDOWS    DOORS".   But
the 'simple' nature of this task so far hasn't helped me to figure it.

Have searched the archives in this board and other resources and
cannot find an explicit example of how to do something like this.

Any suggestions and all examples most appreciated!

Regards
Brando
0
Reply Brando 12/17/2007 11:53:27 PM


On 12/17/2007 5:53 PM, Brando wrote:
> Hello All,
> 
> I;m a newbie to Awk, though I am somewhat familiar with regular
> expressions.  Would like to use Awk to extract data from a large text
> file.  Sometimes the data and the flag for the data are on different
> lines, such as the case below (it would look better in a monospaced
> font):
> 
> NUMBER OF SUBSURFACES
> 
>          EXTERIOR           INTERIOR
> TOTAL    WINDOWS    DOORS    WINDOWS
> 
>     6          6        0          0
> 
> The data sequence shown above might occur hundreds of times in a
> particular file, representing the physical space definition of a the
> building.  So to count the number of doors in a particular building, I
> 'simply' need to accumulate values from the third parameter on the
> second line following the string "TOTAL    WINDOWS    DOORS".   But
> the 'simple' nature of this task so far hasn't helped me to figure it.
> 
> Have searched the archives in this board and other resources and
> cannot find an explicit example of how to do something like this.
> 
> Any suggestions and all examples most appreciated!
> 
> Regards
> Brando

If the lines of data are the only lines where there's just numbers and white
space as in the sample input above, all you need is:

awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file

	Ed.

0
Reply Ed 12/17/2007 11:57:07 PM


On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 12/17/2007 5:53 PM, Brando wrote:
>
>
>
> > Hello All,
>
> > I;m a newbie to Awk, though I am somewhat familiar with regular
> > expressions.  Would like to use Awk to extract data from a large text
> > file.  Sometimes the data and the flag for the data are on different
> > lines, such as the case below (it would look better in a monospaced
> > font):
>
> > NUMBER OF SUBSURFACES
>
> >          EXTERIOR           INTERIOR
> > TOTAL    WINDOWS    DOORS    WINDOWS
>
> >     6          6        0          0
>
> > The data sequence shown above might occur hundreds of times in a
> > particular file, representing the physical space definition of a the
> > building.  So to count the number of doors in a particular building, I
> > 'simply' need to accumulate values from the third parameter on the
> > second line following the string "TOTAL    WINDOWS    DOORS".   But
> > the 'simple' nature of this task so far hasn't helped me to figure it.
>
> > Have searched the archives in this board and other resources and
> > cannot find an explicit example of how to do something like this.
>
> > Any suggestions and all examples most appreciated!
>
> > Regards
> > Brando
>
> If the lines of data are the only lines where there's just numbers and white
> space as in the sample input above, all you need is:
>
> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>
>         Ed.

Thanks Ed,

The data lines shown are just a snippet of a thousand page or so data
dump.  Here they are in context:


DATA FOR SPACE    E1 Level P2 Parking Spc           IN FLOOR    E1
Level P2 Parking Flr


LOCATION OF ORIGIN IN
BUILDING COORDINATES               SPACE
                                 AZIMUTH
SPACE*FLOOR              HEIGHT                AREA
VOLUME
 XB (FT)  YB (FT)  ZB (FT)         (DEG)
MULTIPLIER                (FT)             (SQFT )             (CUFT )

   25.60   -37.35     0.00         90.00
1.0               10.00            52160.97           521609.69


      TOTAL    NUMBER OF    NUMBER OF      NUMBER OF
     NUMBER     EXTERIOR     INTERIOR    UNDERGROUND
OF SURFACES     SURFACES     SURFACES       SURFACES
DAYLIGHTING    SUNSPACE

         10            0            1              9
NO         NO


NUMBER OF SUBSURFACES

         EXTERIOR           INTERIOR
TOTAL    WINDOWS    DOORS    WINDOWS

    0          0        0          0


                             CALCULATION
FLOOR WEIGHT                 TEMPERATURE
 (LB/SQFT  )                     (F    )

         0.0                        70.0


PEOPLE

 
AREA PER              PEOPLE              PEOPLE
 
PERSON            SENSIBLE              LATENT
    SCHEDULE                                        NUMBER
(SQFT )           (BTU/HR )           (BTU/HR )

    E1 Bldg Occup Sch
52.2              1000.0               274.3               468.9


LIGHTING

 
LOAD                                FRACTION
                                      LIGHTING
(WATTS/                LOAD             OF LOAD
    SCHEDULE
TYPE                              SQFT )
(KW)            TO SPACE

    E1 Bldg InsLt Sch                 SUS-
FLUOR                           0.20
10.43                1.00


INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)

 
AREA                                               U-VALUE
    SURFACE                             (SQFT )
CONSTRUCTION              (BTU/HR-SQFT-F)

    E1 Flr (UB.W2.I1)                  52160.97             E1 IFlr
Construction                0.357


So the example you provided doesn't seem to work; while I am using
gawk I don't think that's the issue.  Seems like I need to flag the
data two lines before it appears -- that's what has me stumped at the
moment.  Maybe regex's aren't the way to go.

Perhaps pseudocode something like:

WHILE NOT EOF
read input line,
test for data_flag
 IF data_flag THEN
  throw input line away,
  read and discard next line,
  read next line and accumulate third parameter
 ELSE
  throw input line away
 ENDIF
LOOP

Can anyone translate this brain-flatulence into AWK?

Thanks
Brando
0
Reply Brando 12/18/2007 6:36:16 PM

Brando said the following on 12/18/2007 12:36 PM:
> On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>> On 12/17/2007 5:53 PM, Brando wrote:
>>
>>
>>
>>> Hello All,
>>> I;m a newbie to Awk, though I am somewhat familiar with regular
>>> expressions.  Would like to use Awk to extract data from a large text
>>> file.  Sometimes the data and the flag for the data are on different
>>> lines, such as the case below (it would look better in a monospaced
>>> font):
>>> NUMBER OF SUBSURFACES
>>>          EXTERIOR           INTERIOR
>>> TOTAL    WINDOWS    DOORS    WINDOWS
>>>     6          6        0          0
>>> The data sequence shown above might occur hundreds of times in a
>>> particular file, representing the physical space definition of a the
>>> building.  So to count the number of doors in a particular building, I
>>> 'simply' need to accumulate values from the third parameter on the
>>> second line following the string "TOTAL    WINDOWS    DOORS".   But
>>> the 'simple' nature of this task so far hasn't helped me to figure it.
>>> Have searched the archives in this board and other resources and
>>> cannot find an explicit example of how to do something like this.
>>> Any suggestions and all examples most appreciated!
>>> Regards
>>> Brando
>> If the lines of data are the only lines where there's just numbers and white
>> space as in the sample input above, all you need is:
>>
>> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>>
>>         Ed.
> 
> Thanks Ed,
> 
> The data lines shown are just a snippet of a thousand page or so data
> dump.  Here they are in context:
> 
> 
> DATA FOR SPACE    E1 Level P2 Parking Spc           IN FLOOR    E1
> Level P2 Parking Flr
> 
> 
> LOCATION OF ORIGIN IN
> BUILDING COORDINATES               SPACE
>                                  AZIMUTH
> SPACE*FLOOR              HEIGHT                AREA
> VOLUME
>  XB (FT)  YB (FT)  ZB (FT)         (DEG)
> MULTIPLIER                (FT)             (SQFT )             (CUFT )
> 
>    25.60   -37.35     0.00         90.00
> 1.0               10.00            52160.97           521609.69
> 
> 
>       TOTAL    NUMBER OF    NUMBER OF      NUMBER OF
>      NUMBER     EXTERIOR     INTERIOR    UNDERGROUND
> OF SURFACES     SURFACES     SURFACES       SURFACES
> DAYLIGHTING    SUNSPACE
> 
>          10            0            1              9
> NO         NO
> 
> 
> NUMBER OF SUBSURFACES
> 
>          EXTERIOR           INTERIOR
> TOTAL    WINDOWS    DOORS    WINDOWS
> 
>     0          0        0          0
> 
> 
>                              CALCULATION
> FLOOR WEIGHT                 TEMPERATURE
>  (LB/SQFT  )                     (F    )
> 
>          0.0                        70.0
> 
> 
> PEOPLE
> 
>  
> AREA PER              PEOPLE              PEOPLE
>  
> PERSON            SENSIBLE              LATENT
>     SCHEDULE                                        NUMBER
> (SQFT )           (BTU/HR )           (BTU/HR )
> 
>     E1 Bldg Occup Sch
> 52.2              1000.0               274.3               468.9
> 
> 
> LIGHTING
> 
>  
> LOAD                                FRACTION
>                                       LIGHTING
> (WATTS/                LOAD             OF LOAD
>     SCHEDULE
> TYPE                              SQFT )
> (KW)            TO SPACE
> 
>     E1 Bldg InsLt Sch                 SUS-
> FLUOR                           0.20
> 10.43                1.00
> 
> 
> INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)
> 
>  
> AREA                                               U-VALUE
>     SURFACE                             (SQFT )
> CONSTRUCTION              (BTU/HR-SQFT-F)
> 
>     E1 Flr (UB.W2.I1)                  52160.97             E1 IFlr
> Construction                0.357
> 
> 
> So the example you provided doesn't seem to work; while I am using
> gawk I don't think that's the issue.  Seems like I need to flag the
> data two lines before it appears -- that's what has me stumped at the
> moment.  Maybe regex's aren't the way to go.
> 
> Perhaps pseudocode something like:
> 
> WHILE NOT EOF
> read input line,
> test for data_flag
>  IF data_flag THEN
>   throw input line away,
>   read and discard next line,
>   read next line and accumulate third parameter
>  ELSE
>   throw input line away
>  ENDIF
> LOOP
> 
> Can anyone translate this brain-flatulence into AWK?
> 
> Thanks
> Brando
a brute force solution using Ed's suggestion:

awk '/TOTAL 
+WINDOWS/{go=1}/^[[:digit:][:space:]]+$/&&go{c+=$3;go=0}END{print c}' file

-- 
(^\pop/^)
I Stopped to think but forgot to start again.
--
0
Reply pop 12/18/2007 8:14:57 PM

On Tue, 18 Dec 2007 10:36:16 -0800 (PST), Brando <bwnichols@gmail.com> wrote:

>On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>> On 12/17/2007 5:53 PM, Brando wrote:
>>
>>
>>
>> > Hello All,
>>
>> > I;m a newbie to Awk, though I am somewhat familiar with regular
>> > expressions.  Would like to use Awk to extract data from a large text
>> > file.  Sometimes the data and the flag for the data are on different
>> > lines, such as the case below (it would look better in a monospaced
>> > font):
>>
>> > NUMBER OF SUBSURFACES
>>
>> >          EXTERIOR           INTERIOR
>> > TOTAL    WINDOWS    DOORS    WINDOWS
>>
>> >     6          6        0          0
>>
>> > The data sequence shown above might occur hundreds of times in a
>> > particular file, representing the physical space definition of a the
>> > building.  So to count the number of doors in a particular building, I
>> > 'simply' need to accumulate values from the third parameter on the
>> > second line following the string "TOTAL    WINDOWS    DOORS".   But
>> > the 'simple' nature of this task so far hasn't helped me to figure it.
>>
>> > Have searched the archives in this board and other resources and
>> > cannot find an explicit example of how to do something like this.
>>
>> > Any suggestions and all examples most appreciated!
>>
>> > Regards
>> > Brando
>>
>> If the lines of data are the only lines where there's just numbers and white
>> space as in the sample input above, all you need is:
>>
>> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>>
>>         Ed.
>
>Thanks Ed,
>
>The data lines shown are just a snippet of a thousand page or so data
>dump.  Here they are in context:
>
>
>DATA FOR SPACE    E1 Level P2 Parking Spc           IN FLOOR    E1
>Level P2 Parking Flr
>
>
>LOCATION OF ORIGIN IN
>BUILDING COORDINATES               SPACE
>                                 AZIMUTH
>SPACE*FLOOR              HEIGHT                AREA
>VOLUME
> XB (FT)  YB (FT)  ZB (FT)         (DEG)
>MULTIPLIER                (FT)             (SQFT )             (CUFT )
>
>   25.60   -37.35     0.00         90.00
>1.0               10.00            52160.97           521609.69
>
>
>      TOTAL    NUMBER OF    NUMBER OF      NUMBER OF
>     NUMBER     EXTERIOR     INTERIOR    UNDERGROUND
>OF SURFACES     SURFACES     SURFACES       SURFACES
>DAYLIGHTING    SUNSPACE
>
>         10            0            1              9
>NO         NO
>
>
>NUMBER OF SUBSURFACES
>
>         EXTERIOR           INTERIOR
>TOTAL    WINDOWS    DOORS    WINDOWS
>
>    0          0        0          0
>
>
>                             CALCULATION
>FLOOR WEIGHT                 TEMPERATURE
> (LB/SQFT  )                     (F    )
>
>         0.0                        70.0
>
>
>PEOPLE
>
> 
>AREA PER              PEOPLE              PEOPLE
> 
>PERSON            SENSIBLE              LATENT
>    SCHEDULE                                        NUMBER
>(SQFT )           (BTU/HR )           (BTU/HR )
>
>    E1 Bldg Occup Sch
>52.2              1000.0               274.3               468.9
>
>
>LIGHTING
>
> 
>LOAD                                FRACTION
>                                      LIGHTING
>(WATTS/                LOAD             OF LOAD
>    SCHEDULE
>TYPE                              SQFT )
>(KW)            TO SPACE
>
>    E1 Bldg InsLt Sch                 SUS-
>FLUOR                           0.20
>10.43                1.00
>
>
>INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)
>
> 
>AREA                                               U-VALUE
>    SURFACE                             (SQFT )
>CONSTRUCTION              (BTU/HR-SQFT-F)
>
>    E1 Flr (UB.W2.I1)                  52160.97             E1 IFlr
>Construction                0.357
>
>
>So the example you provided doesn't seem to work; while I am using
>gawk I don't think that's the issue.  Seems like I need to flag the
>data two lines before it appears -- that's what has me stumped at the
>moment.  Maybe regex's aren't the way to go.
>
>Perhaps pseudocode something like:
>
>WHILE NOT EOF
>read input line,
>test for data_flag
> IF data_flag THEN
>  throw input line away,
>  read and discard next line,
>  read next line and accumulate third parameter
> ELSE
>  throw input line away
> ENDIF
>LOOP
>
>Can anyone translate this brain-flatulence into AWK?

Perhaps along the lines of:
awk '/^$/{next}; /TOTAL    WINDOWS    DOORS/ {++df}; \
	df>0 && /[[:digit:]]+/ {doors += $3; df=0}; \
	END{print doors}' datafile

Grant.
-- 
http://bugsplatter.mine.nu/
0
Reply Grant 12/18/2007 8:25:22 PM

Thanks all,

Seem to have it figured...link to post with implemented solution:

http://elcca-exchange.blogspot.com/2007/12/extracting-door-data-from-sim-files.html

Much more data extraction examples (and questions) to follow...

Brando
0
Reply Brando 12/19/2007 1:55:18 AM

5 Replies
223 Views

(page loaded in 0.238 seconds)

Similiar Articles:













7/26/2012 1:47:47 PM


Reply: