Hello All,
I;m a newbie to Awk, though I am somewhat familiar with regular
expressions. Would like to use Awk to extract data from a large text
file. Sometimes the data and the flag for the data are on different
lines, such as the case below (it would look better in a monospaced
font):
NUMBER OF SUBSURFACES
EXTERIOR INTERIOR
TOTAL WINDOWS DOORS WINDOWS
6 6 0 0
The data sequence shown above might occur hundreds of times in a
particular file, representing the physical space definition of a the
building. So to count the number of doors in a particular building, I
'simply' need to accumulate values from the third parameter on the
second line following the string "TOTAL WINDOWS DOORS". But
the 'simple' nature of this task so far hasn't helped me to figure it.
Have searched the archives in this board and other resources and
cannot find an explicit example of how to do something like this.
Any suggestions and all examples most appreciated!
Regards
Brando
|
|
0
|
|
|
|
Reply
|
Brando
|
12/17/2007 11:53:27 PM |
|
On 12/17/2007 5:53 PM, Brando wrote:
> Hello All,
>
> I;m a newbie to Awk, though I am somewhat familiar with regular
> expressions. Would like to use Awk to extract data from a large text
> file. Sometimes the data and the flag for the data are on different
> lines, such as the case below (it would look better in a monospaced
> font):
>
> NUMBER OF SUBSURFACES
>
> EXTERIOR INTERIOR
> TOTAL WINDOWS DOORS WINDOWS
>
> 6 6 0 0
>
> The data sequence shown above might occur hundreds of times in a
> particular file, representing the physical space definition of a the
> building. So to count the number of doors in a particular building, I
> 'simply' need to accumulate values from the third parameter on the
> second line following the string "TOTAL WINDOWS DOORS". But
> the 'simple' nature of this task so far hasn't helped me to figure it.
>
> Have searched the archives in this board and other resources and
> cannot find an explicit example of how to do something like this.
>
> Any suggestions and all examples most appreciated!
>
> Regards
> Brando
If the lines of data are the only lines where there's just numbers and white
space as in the sample input above, all you need is:
awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
12/17/2007 11:57:07 PM
|
|
On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 12/17/2007 5:53 PM, Brando wrote:
>
>
>
> > Hello All,
>
> > I;m a newbie to Awk, though I am somewhat familiar with regular
> > expressions. Would like to use Awk to extract data from a large text
> > file. Sometimes the data and the flag for the data are on different
> > lines, such as the case below (it would look better in a monospaced
> > font):
>
> > NUMBER OF SUBSURFACES
>
> > EXTERIOR INTERIOR
> > TOTAL WINDOWS DOORS WINDOWS
>
> > 6 6 0 0
>
> > The data sequence shown above might occur hundreds of times in a
> > particular file, representing the physical space definition of a the
> > building. So to count the number of doors in a particular building, I
> > 'simply' need to accumulate values from the third parameter on the
> > second line following the string "TOTAL WINDOWS DOORS". But
> > the 'simple' nature of this task so far hasn't helped me to figure it.
>
> > Have searched the archives in this board and other resources and
> > cannot find an explicit example of how to do something like this.
>
> > Any suggestions and all examples most appreciated!
>
> > Regards
> > Brando
>
> If the lines of data are the only lines where there's just numbers and white
> space as in the sample input above, all you need is:
>
> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>
> Ed.
Thanks Ed,
The data lines shown are just a snippet of a thousand page or so data
dump. Here they are in context:
DATA FOR SPACE E1 Level P2 Parking Spc IN FLOOR E1
Level P2 Parking Flr
LOCATION OF ORIGIN IN
BUILDING COORDINATES SPACE
AZIMUTH
SPACE*FLOOR HEIGHT AREA
VOLUME
XB (FT) YB (FT) ZB (FT) (DEG)
MULTIPLIER (FT) (SQFT ) (CUFT )
25.60 -37.35 0.00 90.00
1.0 10.00 52160.97 521609.69
TOTAL NUMBER OF NUMBER OF NUMBER OF
NUMBER EXTERIOR INTERIOR UNDERGROUND
OF SURFACES SURFACES SURFACES SURFACES
DAYLIGHTING SUNSPACE
10 0 1 9
NO NO
NUMBER OF SUBSURFACES
EXTERIOR INTERIOR
TOTAL WINDOWS DOORS WINDOWS
0 0 0 0
CALCULATION
FLOOR WEIGHT TEMPERATURE
(LB/SQFT ) (F )
0.0 70.0
PEOPLE
AREA PER PEOPLE PEOPLE
PERSON SENSIBLE LATENT
SCHEDULE NUMBER
(SQFT ) (BTU/HR ) (BTU/HR )
E1 Bldg Occup Sch
52.2 1000.0 274.3 468.9
LIGHTING
LOAD FRACTION
LIGHTING
(WATTS/ LOAD OF LOAD
SCHEDULE
TYPE SQFT )
(KW) TO SPACE
E1 Bldg InsLt Sch SUS-
FLUOR 0.20
10.43 1.00
INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)
AREA U-VALUE
SURFACE (SQFT )
CONSTRUCTION (BTU/HR-SQFT-F)
E1 Flr (UB.W2.I1) 52160.97 E1 IFlr
Construction 0.357
So the example you provided doesn't seem to work; while I am using
gawk I don't think that's the issue. Seems like I need to flag the
data two lines before it appears -- that's what has me stumped at the
moment. Maybe regex's aren't the way to go.
Perhaps pseudocode something like:
WHILE NOT EOF
read input line,
test for data_flag
IF data_flag THEN
throw input line away,
read and discard next line,
read next line and accumulate third parameter
ELSE
throw input line away
ENDIF
LOOP
Can anyone translate this brain-flatulence into AWK?
Thanks
Brando
|
|
0
|
|
|
|
Reply
|
Brando
|
12/18/2007 6:36:16 PM
|
|
Brando said the following on 12/18/2007 12:36 PM:
> On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>> On 12/17/2007 5:53 PM, Brando wrote:
>>
>>
>>
>>> Hello All,
>>> I;m a newbie to Awk, though I am somewhat familiar with regular
>>> expressions. Would like to use Awk to extract data from a large text
>>> file. Sometimes the data and the flag for the data are on different
>>> lines, such as the case below (it would look better in a monospaced
>>> font):
>>> NUMBER OF SUBSURFACES
>>> EXTERIOR INTERIOR
>>> TOTAL WINDOWS DOORS WINDOWS
>>> 6 6 0 0
>>> The data sequence shown above might occur hundreds of times in a
>>> particular file, representing the physical space definition of a the
>>> building. So to count the number of doors in a particular building, I
>>> 'simply' need to accumulate values from the third parameter on the
>>> second line following the string "TOTAL WINDOWS DOORS". But
>>> the 'simple' nature of this task so far hasn't helped me to figure it.
>>> Have searched the archives in this board and other resources and
>>> cannot find an explicit example of how to do something like this.
>>> Any suggestions and all examples most appreciated!
>>> Regards
>>> Brando
>> If the lines of data are the only lines where there's just numbers and white
>> space as in the sample input above, all you need is:
>>
>> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>>
>> Ed.
>
> Thanks Ed,
>
> The data lines shown are just a snippet of a thousand page or so data
> dump. Here they are in context:
>
>
> DATA FOR SPACE E1 Level P2 Parking Spc IN FLOOR E1
> Level P2 Parking Flr
>
>
> LOCATION OF ORIGIN IN
> BUILDING COORDINATES SPACE
> AZIMUTH
> SPACE*FLOOR HEIGHT AREA
> VOLUME
> XB (FT) YB (FT) ZB (FT) (DEG)
> MULTIPLIER (FT) (SQFT ) (CUFT )
>
> 25.60 -37.35 0.00 90.00
> 1.0 10.00 52160.97 521609.69
>
>
> TOTAL NUMBER OF NUMBER OF NUMBER OF
> NUMBER EXTERIOR INTERIOR UNDERGROUND
> OF SURFACES SURFACES SURFACES SURFACES
> DAYLIGHTING SUNSPACE
>
> 10 0 1 9
> NO NO
>
>
> NUMBER OF SUBSURFACES
>
> EXTERIOR INTERIOR
> TOTAL WINDOWS DOORS WINDOWS
>
> 0 0 0 0
>
>
> CALCULATION
> FLOOR WEIGHT TEMPERATURE
> (LB/SQFT ) (F )
>
> 0.0 70.0
>
>
> PEOPLE
>
>
> AREA PER PEOPLE PEOPLE
>
> PERSON SENSIBLE LATENT
> SCHEDULE NUMBER
> (SQFT ) (BTU/HR ) (BTU/HR )
>
> E1 Bldg Occup Sch
> 52.2 1000.0 274.3 468.9
>
>
> LIGHTING
>
>
> LOAD FRACTION
> LIGHTING
> (WATTS/ LOAD OF LOAD
> SCHEDULE
> TYPE SQFT )
> (KW) TO SPACE
>
> E1 Bldg InsLt Sch SUS-
> FLUOR 0.20
> 10.43 1.00
>
>
> INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)
>
>
> AREA U-VALUE
> SURFACE (SQFT )
> CONSTRUCTION (BTU/HR-SQFT-F)
>
> E1 Flr (UB.W2.I1) 52160.97 E1 IFlr
> Construction 0.357
>
>
> So the example you provided doesn't seem to work; while I am using
> gawk I don't think that's the issue. Seems like I need to flag the
> data two lines before it appears -- that's what has me stumped at the
> moment. Maybe regex's aren't the way to go.
>
> Perhaps pseudocode something like:
>
> WHILE NOT EOF
> read input line,
> test for data_flag
> IF data_flag THEN
> throw input line away,
> read and discard next line,
> read next line and accumulate third parameter
> ELSE
> throw input line away
> ENDIF
> LOOP
>
> Can anyone translate this brain-flatulence into AWK?
>
> Thanks
> Brando
a brute force solution using Ed's suggestion:
awk '/TOTAL
+WINDOWS/{go=1}/^[[:digit:][:space:]]+$/&&go{c+=$3;go=0}END{print c}' file
--
(^\pop/^)
I Stopped to think but forgot to start again.
--
|
|
0
|
|
|
|
Reply
|
pop
|
12/18/2007 8:14:57 PM
|
|
On Tue, 18 Dec 2007 10:36:16 -0800 (PST), Brando <bwnichols@gmail.com> wrote:
>On Dec 17, 3:57 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>> On 12/17/2007 5:53 PM, Brando wrote:
>>
>>
>>
>> > Hello All,
>>
>> > I;m a newbie to Awk, though I am somewhat familiar with regular
>> > expressions. Would like to use Awk to extract data from a large text
>> > file. Sometimes the data and the flag for the data are on different
>> > lines, such as the case below (it would look better in a monospaced
>> > font):
>>
>> > NUMBER OF SUBSURFACES
>>
>> > EXTERIOR INTERIOR
>> > TOTAL WINDOWS DOORS WINDOWS
>>
>> > 6 6 0 0
>>
>> > The data sequence shown above might occur hundreds of times in a
>> > particular file, representing the physical space definition of a the
>> > building. So to count the number of doors in a particular building, I
>> > 'simply' need to accumulate values from the third parameter on the
>> > second line following the string "TOTAL WINDOWS DOORS". But
>> > the 'simple' nature of this task so far hasn't helped me to figure it.
>>
>> > Have searched the archives in this board and other resources and
>> > cannot find an explicit example of how to do something like this.
>>
>> > Any suggestions and all examples most appreciated!
>>
>> > Regards
>> > Brando
>>
>> If the lines of data are the only lines where there's just numbers and white
>> space as in the sample input above, all you need is:
>>
>> awk '/^[[:digit:][:space:]]+$/{c+=$3}END{print c}' file
>>
>> Ed.
>
>Thanks Ed,
>
>The data lines shown are just a snippet of a thousand page or so data
>dump. Here they are in context:
>
>
>DATA FOR SPACE E1 Level P2 Parking Spc IN FLOOR E1
>Level P2 Parking Flr
>
>
>LOCATION OF ORIGIN IN
>BUILDING COORDINATES SPACE
> AZIMUTH
>SPACE*FLOOR HEIGHT AREA
>VOLUME
> XB (FT) YB (FT) ZB (FT) (DEG)
>MULTIPLIER (FT) (SQFT ) (CUFT )
>
> 25.60 -37.35 0.00 90.00
>1.0 10.00 52160.97 521609.69
>
>
> TOTAL NUMBER OF NUMBER OF NUMBER OF
> NUMBER EXTERIOR INTERIOR UNDERGROUND
>OF SURFACES SURFACES SURFACES SURFACES
>DAYLIGHTING SUNSPACE
>
> 10 0 1 9
>NO NO
>
>
>NUMBER OF SUBSURFACES
>
> EXTERIOR INTERIOR
>TOTAL WINDOWS DOORS WINDOWS
>
> 0 0 0 0
>
>
> CALCULATION
>FLOOR WEIGHT TEMPERATURE
> (LB/SQFT ) (F )
>
> 0.0 70.0
>
>
>PEOPLE
>
>
>AREA PER PEOPLE PEOPLE
>
>PERSON SENSIBLE LATENT
> SCHEDULE NUMBER
>(SQFT ) (BTU/HR ) (BTU/HR )
>
> E1 Bldg Occup Sch
>52.2 1000.0 274.3 468.9
>
>
>LIGHTING
>
>
>LOAD FRACTION
> LIGHTING
>(WATTS/ LOAD OF LOAD
> SCHEDULE
>TYPE SQFT )
>(KW) TO SPACE
>
> E1 Bldg InsLt Sch SUS-
>FLUOR 0.20
>10.43 1.00
>
>
>INTERIOR SURFACES (U-VALUE INCLUDES BOTH AIR FILMS)
>
>
>AREA U-VALUE
> SURFACE (SQFT )
>CONSTRUCTION (BTU/HR-SQFT-F)
>
> E1 Flr (UB.W2.I1) 52160.97 E1 IFlr
>Construction 0.357
>
>
>So the example you provided doesn't seem to work; while I am using
>gawk I don't think that's the issue. Seems like I need to flag the
>data two lines before it appears -- that's what has me stumped at the
>moment. Maybe regex's aren't the way to go.
>
>Perhaps pseudocode something like:
>
>WHILE NOT EOF
>read input line,
>test for data_flag
> IF data_flag THEN
> throw input line away,
> read and discard next line,
> read next line and accumulate third parameter
> ELSE
> throw input line away
> ENDIF
>LOOP
>
>Can anyone translate this brain-flatulence into AWK?
Perhaps along the lines of:
awk '/^$/{next}; /TOTAL WINDOWS DOORS/ {++df}; \
df>0 && /[[:digit:]]+/ {doors += $3; df=0}; \
END{print doors}' datafile
Grant.
--
http://bugsplatter.mine.nu/
|
|
0
|
|
|
|
Reply
|
Grant
|
12/18/2007 8:25:22 PM
|
|
Thanks all,
Seem to have it figured...link to post with implemented solution:
http://elcca-exchange.blogspot.com/2007/12/extracting-door-data-from-sim-files.html
Much more data extraction examples (and questions) to follow...
Brando
|
|
0
|
|
|
|
Reply
|
Brando
|
12/19/2007 1:55:18 AM
|
|
|
5 Replies
223 Views
(page loaded in 0.238 seconds)
|