Fixed length records containing 2 different records types with fixed field widths

  • Follow


This is a fun one. I am beginning to receive fixed length record ASCII
files but they contain 2 different record types.

- All records are 290 in length but have different field widths.
- The field widths for each of the 2 record types is fixed
- One record type always begins with an "A" in the 73 position.
- I need to parse the records to 2 new comma separated text files.

I have bought 2 books and researched and still cannot get the syntax
even close.
I need to do something like this:

(Please forgive the sloppy syntax, I'm a rookie)

gawk {If  ".{75}"= A}
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="75 45 50 75 45" OFS=,
RECTA.txt;
else
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="45 50 45 75 75" OFS=,
>RECTB.txt;
REC290.txt  <<<< That's my ASCII file

In other words, I want to parse REC290.txt to RECTA.txt for (the
"A" in the 73 position) record type and parse the other record type
(Which has no distinguishing characteristics) to RECTB.txt
Any help would be very gratefully appreciated.

0
Reply mis_pro (3) 1/4/2005 11:37:31 PM

In article <1104881851.636834.302920@z14g2000cwz.googlegroups.com>,
Rookie Card <mis_pro@yahoo.com> wrote:
>This is a fun one. I am beginning to receive fixed length record ASCII
>files but they contain 2 different record types.
>
>- All records are 290 in length but have different field widths.
>- The field widths for each of the 2 record types is fixed
>- One record type always begins with an "A" in the 73 position.
>- I need to parse the records to 2 new comma separated text files.

I would imagine something like:

BEGIN	{
	fw[0]="45 50 45 75 75"
	fw[1]="75 45 50 75 45"
	OFS=","
	}
{FIELDWIDTHS = fw[substr($0,73,1) == "A"];$0=$0}
{ ... rest of program goes here ... }

0
Reply gazelle 1/4/2005 11:49:02 PM


Thanks Kenny,
That puts me different direction. I'm still fighting syntax errors but
slowly making progress.
I will post the final code when I get it figured out. If I get stuck,
which I probably will, I'll post the code and detail where it is
choking.

0
Reply Rookie 1/5/2005 6:04:50 PM

I need to backup a bit. My client keeps changing their requirements. (6
new files with 14 records types)
Lets forget the 2 different record types to two different files for
now.
I've got the parsing and field spliting figured out. (Thanks to John,
Janis and Jim's posts)

I just cant seem to figure out how to filter records based on the
position of a character in the record using awk.

here's the example:
The only output I would want would be records with an "A" in the 18
position

input file would look like this: (let call it REC18.txt)
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
The output I want would look like this: (Lets call it RECA.txt)
20040911324834736A90028
20041112534129983A93065
I tried:

gawk 'BEGIN { print substr($0,18)  == "A"}' REC18.txt >RECA.txt
Output was:
1

And tried:
gawk ' { print substr(18,1) =="A"}' REC18.txt >RECA.txt
Output was:
0
0
0
0
0
gawk 'BEGIN { print substr($0,18,1)  == "A"}' REC18.txt >RECA.txt
Output was:
1
amoung many "tries"

I RTFM, Debug for hours and searched news groups. Its got to be there.
Right?
I thank you in advance for your help.
Gary / Rookie Card

0
Reply Rookie 1/6/2005 5:21:52 AM

Rookie Card wrote:
> The only output I would want would be records with an "A" in the 18
> position
>
> input file would look like this: (let call it REC18.txt)
> 20040911324834736A90028
> CLIENTID000VNI112B92658
> CLIENTID000VNI118S98271
> 20041112534129983A93065
> The output I want would look like this: (Lets call it RECA.txt)
> 20040911324834736A90028
> 20041112534129983A93065
> 
  
awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt

0
Reply William 1/6/2005 7:26:19 AM

William - Thank you for replying
I tried
awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
Output was Null

Then I tried:
awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
Output was this:
0
0
0
0
I get a 0 in the output file for every record in the input file.
Strange.
Getting close, I just can't see what I'm doing wrong.
Gary / Rookie Card

0
Reply Rookie 1/6/2005 3:34:31 PM

Rookie Card wrote:
> William - Thank you for replying
> I tried
> awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
> Output was Null
>
> Then I tried:
> awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
> Output was this:
> 0
> 0
> 0
> 0
> I get a 0 in the output file for every record in the input file.
> Strange.
> Getting close, I just can't see what I'm doing wrong.
> Gary / Rookie Card

I tested it with the data you supplied and it produced the correct
results.  Do the lines of the file have any leading spaces that need to
be stripped?

Try this:
awk '/A/' REC18.txt

0
Reply William 1/6/2005 3:56:51 PM


Rookie Card wrote:
> William - Thank you for replying
> I tried
> awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
> Output was Null
> 
> Then I tried:
> awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
> Output was this:
> 0
> 0
> 0
> 0
> I get a 0 in the output file for every record in the input file.

The above are both telling you that none of the records in your input 
file have an A in the 18th column. Do this:

awk '{ print substr($0,18,1) }' REC18.txt

to see what's really in the 18th column.

	Ed.

> Strange.
> Getting close, I just can't see what I'm doing wrong.
> Gary / Rookie Card
> 
0
Reply Ed 1/6/2005 4:08:50 PM

William -
The files don't have leading spaces but do have spaces in the middle
Like this:

200409113  834736A90028
CLIENTID 00VNI112B92658
CLIE  ID000VNI118S98271
200411   34129983A93065

The spaces are never consistant as they represent a fixed lenth field
and the data is not always the same legnth. Although the posistion of
the records identifier ( A ) is always the same if you count the
spaces. (Fixed length records with fixed field widths)
I also tried awk '/A/' REC18.txt
the output returned all records.

0
Reply Rookie 1/6/2005 4:24:42 PM


Rookie Card wrote:
> William -
> The files don't have leading spaces but do have spaces in the middle
> Like this:
> 
> 200409113  834736A90028
> CLIENTID 00VNI112B92658
> CLIE  ID000VNI118S98271
> 200411   34129983A93065
> 
> The spaces are never consistant as they represent a fixed lenth field
> and the data is not always the same legnth. Although the posistion of
> the records identifier ( A ) is always the same if you count the
> spaces. (Fixed length records with fixed field widths)
> I also tried awk '/A/' REC18.txt
> the output returned all records.

That's impossible given the input file you showed above.

	Ed.
0
Reply Ed 1/6/2005 4:35:49 PM

Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A

Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card

0
Reply Rookie 1/6/2005 4:56:02 PM

Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A

Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card

0
Reply Rookie 1/6/2005 4:59:02 PM

In article <crjpdc$it6@netnews.proxy.lucent.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
....
>> The spaces are never consistant as they represent a fixed lenth field
>> and the data is not always the same legnth. Although the posistion of
>> the records identifier ( A ) is always the same if you count the
>> spaces. (Fixed length records with fixed field widths)
>> I also tried awk '/A/' REC18.txt
>> the output returned all records.
>
>That's impossible given the input file you showed above.
>
>	Ed.

There is some kind of fundamental miscommunication going on in this thread.

For one, it is clear that the OP hasn't a clue about how AWK works.
He should acquire and read thoroughly some basic texts.

Second, I think there may be some kind of shell-quoting issue going on --
which is a nice way of saying, "I think he is using MS DOS w/o telling us
and we are, quite naturally, assuming Unix."

0
Reply gazelle 1/6/2005 5:13:22 PM

>Although the posistion of
>the records identifier ( A ) is always the same if you count the
>spaces.

In
200409113 834736A90028
A is the 17th character and is the 7th character in the
space-delimited field.
In
200411   34129983A93065
A is the 18th character and is the 9th character in the field.
Try this:
awk '$2 ~ /[0-9]A[0-9]/'  REC18.txt

0
Reply William 1/6/2005 5:44:50 PM

>Although the posistion of
>the records identifier ( A ) is always the same if you count the
>spaces.

In
200409113 834736A90028
A is the 17th character and is the 7th character in the
space-delimited field.
In
200411   34129983A93065
A is the 18th character and is the 9th character in the field.
Try this:
awk '$2 ~ /[0-9]A[0-9]/'  REC18.txt

0
Reply William 1/6/2005 5:45:28 PM

Kenny - your right.
- I have very limmited knowlege of awk
- I am using gawk 3.1.3 on win32 platfrom ( I thought I could just
substitute ' for " )
Ok, Now I feel like a real dumbass. Sorry if I have wasted everyones
time.
Although, I have read through
http://www.gnu.org/software/gawk/manual/gawk.html
and done many of the example exercises.
- Can you suggest a good book on awk?
I don't mean to seem so lame, I'm learning and have learned alot from
this new group
Maybe I should change my handle from "Rookie Card" to "Lame Newbie"

0
Reply Rookie 1/6/2005 5:50:54 PM


William James wrote:

>>Although the posistion of
>>the records identifier ( A ) is always the same if you count the
>>spaces.
> 
> 
> In
> 200409113 834736A90028
> A is the 17th character and is the 7th character in the
> space-delimited field.

That's true, but he never posted that text as one of his input lines. 
His had 2 spaces between the 2 fields:

200409113  834736A90028

Regards,

	Ed.
0
Reply Ed 1/6/2005 6:04:45 PM

Rookie Card wrote:
> Kenny - your right.
> - I have very limmited knowlege of awk
> - I am using gawk 3.1.3 on win32 platfrom ( I thought I could just
> substitute ' for " )

It should work under win32; that's what I'm using although
the commands I posted were shown unix-style.

0
Reply William 1/6/2005 6:11:18 PM


Rookie Card wrote:

> Ed -
> awk '{ print substr($0,18,1) }' REC18.txt
> A
> B
> S
> A
> 
> Then I try
> awk '{ print "A"==substr($0,18,1) }' REC18.txt
> 0
> 0
> 0
> 0
> This is with the example data: REC18.txt
> 20040911324834736A90028
> CLIENTID000VNI112B92658
> CLIENTID000VNI118S98271
> 20041112534129983A93065
> Thanks
> Gary / Rookie Card
> 

Since neither oawk nor nawk would accept that last command, I assume 
you're running gawk. Here's what happens when I use gawk version 3.0.4 
on Solaris:

PS1> gawk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
PS1> gawk '{ print "A"==substr($0,18,1) }' REC18.txt
1
0
0
1

Regards,

	Ed.
0
Reply Ed 1/6/2005 6:12:28 PM

When I copy "200409113 834736A90028" and paste it into
a text editor, it shows 1 space.

G@@gle strikes again, it seems.

0
Reply William 1/6/2005 6:16:48 PM


William James wrote:

> When I copy "200409113 834736A90028" and paste it into
> a text editor, it shows 1 space.
> 
> G@@gle strikes again, it seems.
> 

It also doesn't post your responses in the right thread-order for 
Netscape (and other newsreaders?), but always just sticks them at the 
bottom of the original thread. I've no idea what controls that but I've 
only seen this problem with postings from you and "Rookie Card". It 
would be helpful if you could include some more context in your 
responses so we can tell which specific post your responding to.

Thanks,

	Ed.
0
Reply Ed 1/6/2005 6:28:09 PM

> It also doesn't post your responses in the right thread-order for
> Netscape (and other newsreaders?), but always just sticks them at the

> bottom of the original thread. I've no idea what controls that but
I've
> only seen this problem with postings from you and "Rookie Card". It
> would be helpful if you could include some more context in your
> responses so we can tell which specific post your responding to.
>
> Thanks,
>
> 	Ed.

Ed - oh, I can be sure it's something that I am doing. I am using the
google website as my reader/poster. The thread looks correct from my
side. I'll configure a real news reader. I'm a newbie and don't want to
tick people any more than I already have.
Thanks,
Gary / Rookie Card aka Lame Newbie

0
Reply Rookie 1/6/2005 6:46:23 PM

William,
What version of gawk are you using? I am using gnu gawk 3.1.3
We used the same script and the same data.
It filtered the records perfect for you and not for me.
Then the only difference would be the executable.
Right? I could be mistaken. I do that alot.

Well, there is always the X factor. My problem may be between the Chair
and the Keyboard.
If you could tell what version of gawk you are using that would be very
helpful.
Gary

0
Reply Rookie 1/6/2005 7:07:07 PM

I'm geting closer (this gawk 3.1.3 / MSDOS)
I ran:
gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
Output was:
A90028

A93065

It now filters correct but only prints the last 6 charactors
Now I just need to figure out how to get it to print the whole record
which in the awk lang is ($0) Right?

0
Reply Rookie 1/6/2005 7:23:05 PM

Rookie Card wrote:
> William,
> What version of gawk are you using? I am using gnu gawk 3.1.3
> We used the same script and the same data.
> It filtered the records perfect for you and not for me.
> Then the only difference would be the executable.
> Right? I could be mistaken. I do that alot.
>
> Well, there is always the X factor. My problem may be between the
Chair
> and the Keyboard.
> If you could tell what version of gawk you are using that would be
very
> helpful.
> Gary

GNU Awk 3.0.3 and Kernighan's awk and mawk.

I've converted spaces to commas and saved this in file "data":

200409113,,834736A90028
CLIENTID,00VNI112B92658
CLIE,,ID000VNI118S98271
200411,,,34129983A93065

This command line
awk "\"A\"==substr($0,18,1)" data

produces this output, using any of those awks:
200409113,,834736A90028
200411,,,34129983A93065

0
Reply William 1/6/2005 7:24:20 PM


Rookie Card wrote:
> I'm geting closer (this gawk 3.1.3 / MSDOS)
> I ran:
> gawk "{ print "A"=substr($0,18,$0)}" REC18.txt

The above is doing the following things wrong:

1) Passing a string as the third argument for substr()
2) Trying to assign the result of substsr() to a string
3) Not escaping the double-quotes within the script.

Try this:

gawk "{ print \"A\"==substr($0,18,1)}" REC18.txt

and next time please post what you're actually using!

	Ed.

> Output was:
> A90028
> 
> A93065
> 
> It now filters correct but only prints the last 6 charactors
> Now I just need to figure out how to get it to print the whole record
> which in the awk lang is ($0) Right?
> 
0
Reply Ed 1/6/2005 7:28:07 PM

In article <crk3gf$n83@netnews.proxy.lucent.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>Rookie Card wrote:
>> I'm geting closer (this gawk 3.1.3 / MSDOS)
>> I ran:
>> gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
>
>The above is doing the following things wrong:
>
>1) Passing a string as the third argument for substr()

Perfectly legal.  The value of $0 is converted to an integer and used as
the length of the string to extract.

>2) Trying to assign the result of substsr() to a string

That's what I thought at first glance - and I assumed (correctly, as it
turns out) that GAWK (or any AWK, for that matter) would flag it as an
error.  However, look beneath the surface.

In the crazy, mixed up world of MS command interpreters, this parses as:

{ print A=something }

which is perfectly legal.

>3) Not escaping the double-quotes within the script.

See above comments about the "crazy, mixed up world".

Nothing to do with AWK, of course.

0
Reply gazelle 1/6/2005 8:04:44 PM

Don't feel too bad, Rookie Card.  We all make embarassing mistakes.

>I ran:
>gawk "{ print "A"=substr($0,18,$0)}" REC18.txt

In the future, don't retype the posted code; copy and paste it.
That way you won't make any typing errors.

0
Reply William 1/6/2005 8:08:30 PM

William,
That worked!
I used the Kernighan version of awk for win32.
awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
The problem was gnu version 3.1.3 of gawk.exe

0
Reply Rookie 1/6/2005 8:12:46 PM


Rookie Card wrote:
> William,
> That worked!
> I used the Kernighan version of awk for win32.
> awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
> The problem was gnu version 3.1.3 of gawk.exe
> 

Given your other postings where you didn't use the above syntax, I doubt 
if gawk was really the problem. Try it again using exactly the syntax above.

	Ed.
0
Reply Ed 1/6/2005 8:16:41 PM

Ed Morton wrote:
>Rookie Card wrote:
>
>> William,
>> That worked!
>> I used the Kernighan version of awk for win32.
>> awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
>> The problem was gnu version 3.1.3 of gawk.exe
>
>
>Given your other postings where you didn't use the above syntax, I
doubt
>if gawk was really the problem. Try it again using exactly the syntax
above.

Rookie, it would be incredible if gawk was the problem.  The older
version
worked for me and the newer version almost certainly will.

0
Reply William 1/6/2005 8:27:50 PM

William
Kernighan's awk worked perfectly. Your code and Ed's code all worked
fine in Kernighan's awk.
gawk "{ print "\"A\"==substr($0,18,1)}" REC18.txt
Output was:
20040911324834736A90028
20041112534129983A93065

Thats what I've been looking for!
Besides the problem between the Chair and Keyboard I also had big
problems with the gnu version 3.1.3 for win32.
I want to thank you, Ed and Kenny. I know I have been a bit annoying.
- Also, can anyone suggest a good book on the awk lang as I will be
using it alot this year

Gary / Rookie Card / Annoying Lame Newbie

0
Reply Rookie 1/6/2005 8:43:16 PM


Rookie Card wrote:
> William
> Kernighan's awk worked perfectly. Your code and Ed's code all worked
> fine in Kernighan's awk.
> gawk "{ print "\"A\"==substr($0,18,1)}" REC18.txt
> Output was:
> 20040911324834736A90028
> 20041112534129983A93065
> 
> Thats what I've been looking for!
> Besides the problem between the Chair and Keyboard I also had big
> problems with the gnu version 3.1.3 for win32.
> I want to thank you, Ed and Kenny. I know I have been a bit annoying.
> - Also, can anyone suggest a good book on the awk lang as I will be
> using it alot this year

I think the on-line GNU text you mentioned elsethread 
(http://www.gnu.org/software/gawk/manual/gawk.html) is the best. The 
only other one I've looked through is "The AWK Programming Language" by 
Aho et al. It's OK as an introduction to awk in general, but obviously 
doesn't cover the very useful GNU awk extensions. I get most of my awk 
education in this NG and at comp.unix.shell.

Glad to hear things are working for you now.

	Ed.

> Gary / Rookie Card / Annoying Lame Newbie
> 
0
Reply Ed 1/6/2005 10:36:11 PM

32 Replies
365 Views

(page loaded in 0.23 seconds)

Similiar Articles:


















7/20/2012 5:52:17 PM


Reply: