This is a fun one. I am beginning to receive fixed length record ASCII
files but they contain 2 different record types.
- All records are 290 in length but have different field widths.
- The field widths for each of the 2 record types is fixed
- One record type always begins with an "A" in the 73 position.
- I need to parse the records to 2 new comma separated text files.
I have bought 2 books and researched and still cannot get the syntax
even close.
I need to do something like this:
(Please forgive the sloppy syntax, I'm a rookie)
gawk {If ".{75}"= A}
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="75 45 50 75 45" OFS=,
RECTA.txt;
else
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="45 50 45 75 75" OFS=,
>RECTB.txt;
REC290.txt <<<< That's my ASCII file
In other words, I want to parse REC290.txt to RECTA.txt for (the
"A" in the 73 position) record type and parse the other record type
(Which has no distinguishing characteristics) to RECTB.txt
Any help would be very gratefully appreciated.
|
|
0
|
|
|
|
Reply
|
mis_pro (3)
|
1/4/2005 11:37:31 PM |
|
In article <1104881851.636834.302920@z14g2000cwz.googlegroups.com>,
Rookie Card <mis_pro@yahoo.com> wrote:
>This is a fun one. I am beginning to receive fixed length record ASCII
>files but they contain 2 different record types.
>
>- All records are 290 in length but have different field widths.
>- The field widths for each of the 2 record types is fixed
>- One record type always begins with an "A" in the 73 position.
>- I need to parse the records to 2 new comma separated text files.
I would imagine something like:
BEGIN {
fw[0]="45 50 45 75 75"
fw[1]="75 45 50 75 45"
OFS=","
}
{FIELDWIDTHS = fw[substr($0,73,1) == "A"];$0=$0}
{ ... rest of program goes here ... }
|
|
0
|
|
|
|
Reply
|
gazelle
|
1/4/2005 11:49:02 PM
|
|
Thanks Kenny,
That puts me different direction. I'm still fighting syntax errors but
slowly making progress.
I will post the final code when I get it figured out. If I get stuck,
which I probably will, I'll post the code and detail where it is
choking.
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/5/2005 6:04:50 PM
|
|
I need to backup a bit. My client keeps changing their requirements. (6
new files with 14 records types)
Lets forget the 2 different record types to two different files for
now.
I've got the parsing and field spliting figured out. (Thanks to John,
Janis and Jim's posts)
I just cant seem to figure out how to filter records based on the
position of a character in the record using awk.
here's the example:
The only output I would want would be records with an "A" in the 18
position
input file would look like this: (let call it REC18.txt)
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
The output I want would look like this: (Lets call it RECA.txt)
20040911324834736A90028
20041112534129983A93065
I tried:
gawk 'BEGIN { print substr($0,18) == "A"}' REC18.txt >RECA.txt
Output was:
1
And tried:
gawk ' { print substr(18,1) =="A"}' REC18.txt >RECA.txt
Output was:
0
0
0
0
0
gawk 'BEGIN { print substr($0,18,1) == "A"}' REC18.txt >RECA.txt
Output was:
1
amoung many "tries"
I RTFM, Debug for hours and searched news groups. Its got to be there.
Right?
I thank you in advance for your help.
Gary / Rookie Card
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 5:21:52 AM
|
|
Rookie Card wrote:
> The only output I would want would be records with an "A" in the 18
> position
>
> input file would look like this: (let call it REC18.txt)
> 20040911324834736A90028
> CLIENTID000VNI112B92658
> CLIENTID000VNI118S98271
> 20041112534129983A93065
> The output I want would look like this: (Lets call it RECA.txt)
> 20040911324834736A90028
> 20041112534129983A93065
>
awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 7:26:19 AM
|
|
William - Thank you for replying
I tried
awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
Output was Null
Then I tried:
awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
Output was this:
0
0
0
0
I get a 0 in the output file for every record in the input file.
Strange.
Getting close, I just can't see what I'm doing wrong.
Gary / Rookie Card
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 3:34:31 PM
|
|
Rookie Card wrote:
> William - Thank you for replying
> I tried
> awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
> Output was Null
>
> Then I tried:
> awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
> Output was this:
> 0
> 0
> 0
> 0
> I get a 0 in the output file for every record in the input file.
> Strange.
> Getting close, I just can't see what I'm doing wrong.
> Gary / Rookie Card
I tested it with the data you supplied and it produced the correct
results. Do the lines of the file have any leading spaces that need to
be stripped?
Try this:
awk '/A/' REC18.txt
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 3:56:51 PM
|
|
Rookie Card wrote:
> William - Thank you for replying
> I tried
> awk '"A"==substr($0,18,1)' REC18.txt >RECA.txt
> Output was Null
>
> Then I tried:
> awk '{ print "A"==substr($0,18,1) }' REC18.txt >RECA.txt
> Output was this:
> 0
> 0
> 0
> 0
> I get a 0 in the output file for every record in the input file.
The above are both telling you that none of the records in your input
file have an A in the 18th column. Do this:
awk '{ print substr($0,18,1) }' REC18.txt
to see what's really in the 18th column.
Ed.
> Strange.
> Getting close, I just can't see what I'm doing wrong.
> Gary / Rookie Card
>
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 4:08:50 PM
|
|
William -
The files don't have leading spaces but do have spaces in the middle
Like this:
200409113 834736A90028
CLIENTID 00VNI112B92658
CLIE ID000VNI118S98271
200411 34129983A93065
The spaces are never consistant as they represent a fixed lenth field
and the data is not always the same legnth. Although the posistion of
the records identifier ( A ) is always the same if you count the
spaces. (Fixed length records with fixed field widths)
I also tried awk '/A/' REC18.txt
the output returned all records.
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 4:24:42 PM
|
|
Rookie Card wrote:
> William -
> The files don't have leading spaces but do have spaces in the middle
> Like this:
>
> 200409113 834736A90028
> CLIENTID 00VNI112B92658
> CLIE ID000VNI118S98271
> 200411 34129983A93065
>
> The spaces are never consistant as they represent a fixed lenth field
> and the data is not always the same legnth. Although the posistion of
> the records identifier ( A ) is always the same if you count the
> spaces. (Fixed length records with fixed field widths)
> I also tried awk '/A/' REC18.txt
> the output returned all records.
That's impossible given the input file you showed above.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 4:35:49 PM
|
|
Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 4:56:02 PM
|
|
Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 4:59:02 PM
|
|
In article <crjpdc$it6@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>> The spaces are never consistant as they represent a fixed lenth field
>> and the data is not always the same legnth. Although the posistion of
>> the records identifier ( A ) is always the same if you count the
>> spaces. (Fixed length records with fixed field widths)
>> I also tried awk '/A/' REC18.txt
>> the output returned all records.
>
>That's impossible given the input file you showed above.
>
> Ed.
There is some kind of fundamental miscommunication going on in this thread.
For one, it is clear that the OP hasn't a clue about how AWK works.
He should acquire and read thoroughly some basic texts.
Second, I think there may be some kind of shell-quoting issue going on --
which is a nice way of saying, "I think he is using MS DOS w/o telling us
and we are, quite naturally, assuming Unix."
|
|
0
|
|
|
|
Reply
|
gazelle
|
1/6/2005 5:13:22 PM
|
|
>Although the posistion of
>the records identifier ( A ) is always the same if you count the
>spaces.
In
200409113 834736A90028
A is the 17th character and is the 7th character in the
space-delimited field.
In
200411 34129983A93065
A is the 18th character and is the 9th character in the field.
Try this:
awk '$2 ~ /[0-9]A[0-9]/' REC18.txt
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 5:44:50 PM
|
|
>Although the posistion of
>the records identifier ( A ) is always the same if you count the
>spaces.
In
200409113 834736A90028
A is the 17th character and is the 7th character in the
space-delimited field.
In
200411 34129983A93065
A is the 18th character and is the 9th character in the field.
Try this:
awk '$2 ~ /[0-9]A[0-9]/' REC18.txt
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 5:45:28 PM
|
|
Kenny - your right.
- I have very limmited knowlege of awk
- I am using gawk 3.1.3 on win32 platfrom ( I thought I could just
substitute ' for " )
Ok, Now I feel like a real dumbass. Sorry if I have wasted everyones
time.
Although, I have read through
http://www.gnu.org/software/gawk/manual/gawk.html
and done many of the example exercises.
- Can you suggest a good book on awk?
I don't mean to seem so lame, I'm learning and have learned alot from
this new group
Maybe I should change my handle from "Rookie Card" to "Lame Newbie"
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 5:50:54 PM
|
|
William James wrote:
>>Although the posistion of
>>the records identifier ( A ) is always the same if you count the
>>spaces.
>
>
> In
> 200409113 834736A90028
> A is the 17th character and is the 7th character in the
> space-delimited field.
That's true, but he never posted that text as one of his input lines.
His had 2 spaces between the 2 fields:
200409113 834736A90028
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 6:04:45 PM
|
|
Rookie Card wrote:
> Kenny - your right.
> - I have very limmited knowlege of awk
> - I am using gawk 3.1.3 on win32 platfrom ( I thought I could just
> substitute ' for " )
It should work under win32; that's what I'm using although
the commands I posted were shown unix-style.
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 6:11:18 PM
|
|
Rookie Card wrote:
> Ed -
> awk '{ print substr($0,18,1) }' REC18.txt
> A
> B
> S
> A
>
> Then I try
> awk '{ print "A"==substr($0,18,1) }' REC18.txt
> 0
> 0
> 0
> 0
> This is with the example data: REC18.txt
> 20040911324834736A90028
> CLIENTID000VNI112B92658
> CLIENTID000VNI118S98271
> 20041112534129983A93065
> Thanks
> Gary / Rookie Card
>
Since neither oawk nor nawk would accept that last command, I assume
you're running gawk. Here's what happens when I use gawk version 3.0.4
on Solaris:
PS1> gawk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
PS1> gawk '{ print "A"==substr($0,18,1) }' REC18.txt
1
0
0
1
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 6:12:28 PM
|
|
When I copy "200409113 834736A90028" and paste it into
a text editor, it shows 1 space.
G@@gle strikes again, it seems.
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 6:16:48 PM
|
|
William James wrote:
> When I copy "200409113 834736A90028" and paste it into
> a text editor, it shows 1 space.
>
> G@@gle strikes again, it seems.
>
It also doesn't post your responses in the right thread-order for
Netscape (and other newsreaders?), but always just sticks them at the
bottom of the original thread. I've no idea what controls that but I've
only seen this problem with postings from you and "Rookie Card". It
would be helpful if you could include some more context in your
responses so we can tell which specific post your responding to.
Thanks,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 6:28:09 PM
|
|
> It also doesn't post your responses in the right thread-order for
> Netscape (and other newsreaders?), but always just sticks them at the
> bottom of the original thread. I've no idea what controls that but
I've
> only seen this problem with postings from you and "Rookie Card". It
> would be helpful if you could include some more context in your
> responses so we can tell which specific post your responding to.
>
> Thanks,
>
> Ed.
Ed - oh, I can be sure it's something that I am doing. I am using the
google website as my reader/poster. The thread looks correct from my
side. I'll configure a real news reader. I'm a newbie and don't want to
tick people any more than I already have.
Thanks,
Gary / Rookie Card aka Lame Newbie
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 6:46:23 PM
|
|
William,
What version of gawk are you using? I am using gnu gawk 3.1.3
We used the same script and the same data.
It filtered the records perfect for you and not for me.
Then the only difference would be the executable.
Right? I could be mistaken. I do that alot.
Well, there is always the X factor. My problem may be between the Chair
and the Keyboard.
If you could tell what version of gawk you are using that would be very
helpful.
Gary
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 7:07:07 PM
|
|
I'm geting closer (this gawk 3.1.3 / MSDOS)
I ran:
gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
Output was:
A90028
A93065
It now filters correct but only prints the last 6 charactors
Now I just need to figure out how to get it to print the whole record
which in the awk lang is ($0) Right?
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 7:23:05 PM
|
|
Rookie Card wrote:
> William,
> What version of gawk are you using? I am using gnu gawk 3.1.3
> We used the same script and the same data.
> It filtered the records perfect for you and not for me.
> Then the only difference would be the executable.
> Right? I could be mistaken. I do that alot.
>
> Well, there is always the X factor. My problem may be between the
Chair
> and the Keyboard.
> If you could tell what version of gawk you are using that would be
very
> helpful.
> Gary
GNU Awk 3.0.3 and Kernighan's awk and mawk.
I've converted spaces to commas and saved this in file "data":
200409113,,834736A90028
CLIENTID,00VNI112B92658
CLIE,,ID000VNI118S98271
200411,,,34129983A93065
This command line
awk "\"A\"==substr($0,18,1)" data
produces this output, using any of those awks:
200409113,,834736A90028
200411,,,34129983A93065
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 7:24:20 PM
|
|
Rookie Card wrote:
> I'm geting closer (this gawk 3.1.3 / MSDOS)
> I ran:
> gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
The above is doing the following things wrong:
1) Passing a string as the third argument for substr()
2) Trying to assign the result of substsr() to a string
3) Not escaping the double-quotes within the script.
Try this:
gawk "{ print \"A\"==substr($0,18,1)}" REC18.txt
and next time please post what you're actually using!
Ed.
> Output was:
> A90028
>
> A93065
>
> It now filters correct but only prints the last 6 charactors
> Now I just need to figure out how to get it to print the whole record
> which in the awk lang is ($0) Right?
>
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 7:28:07 PM
|
|
In article <crk3gf$n83@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>Rookie Card wrote:
>> I'm geting closer (this gawk 3.1.3 / MSDOS)
>> I ran:
>> gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
>
>The above is doing the following things wrong:
>
>1) Passing a string as the third argument for substr()
Perfectly legal. The value of $0 is converted to an integer and used as
the length of the string to extract.
>2) Trying to assign the result of substsr() to a string
That's what I thought at first glance - and I assumed (correctly, as it
turns out) that GAWK (or any AWK, for that matter) would flag it as an
error. However, look beneath the surface.
In the crazy, mixed up world of MS command interpreters, this parses as:
{ print A=something }
which is perfectly legal.
>3) Not escaping the double-quotes within the script.
See above comments about the "crazy, mixed up world".
Nothing to do with AWK, of course.
|
|
0
|
|
|
|
Reply
|
gazelle
|
1/6/2005 8:04:44 PM
|
|
Don't feel too bad, Rookie Card. We all make embarassing mistakes.
>I ran:
>gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
In the future, don't retype the posted code; copy and paste it.
That way you won't make any typing errors.
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 8:08:30 PM
|
|
William,
That worked!
I used the Kernighan version of awk for win32.
awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
The problem was gnu version 3.1.3 of gawk.exe
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 8:12:46 PM
|
|
Rookie Card wrote:
> William,
> That worked!
> I used the Kernighan version of awk for win32.
> awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
> The problem was gnu version 3.1.3 of gawk.exe
>
Given your other postings where you didn't use the above syntax, I doubt
if gawk was really the problem. Try it again using exactly the syntax above.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 8:16:41 PM
|
|
Ed Morton wrote:
>Rookie Card wrote:
>
>> William,
>> That worked!
>> I used the Kernighan version of awk for win32.
>> awk "\"A\"==substr($0,18,1)" REC18.txt >RECA.txt
>> The problem was gnu version 3.1.3 of gawk.exe
>
>
>Given your other postings where you didn't use the above syntax, I
doubt
>if gawk was really the problem. Try it again using exactly the syntax
above.
Rookie, it would be incredible if gawk was the problem. The older
version
worked for me and the newer version almost certainly will.
|
|
0
|
|
|
|
Reply
|
William
|
1/6/2005 8:27:50 PM
|
|
William
Kernighan's awk worked perfectly. Your code and Ed's code all worked
fine in Kernighan's awk.
gawk "{ print "\"A\"==substr($0,18,1)}" REC18.txt
Output was:
20040911324834736A90028
20041112534129983A93065
Thats what I've been looking for!
Besides the problem between the Chair and Keyboard I also had big
problems with the gnu version 3.1.3 for win32.
I want to thank you, Ed and Kenny. I know I have been a bit annoying.
- Also, can anyone suggest a good book on the awk lang as I will be
using it alot this year
Gary / Rookie Card / Annoying Lame Newbie
|
|
0
|
|
|
|
Reply
|
Rookie
|
1/6/2005 8:43:16 PM
|
|
Rookie Card wrote:
> William
> Kernighan's awk worked perfectly. Your code and Ed's code all worked
> fine in Kernighan's awk.
> gawk "{ print "\"A\"==substr($0,18,1)}" REC18.txt
> Output was:
> 20040911324834736A90028
> 20041112534129983A93065
>
> Thats what I've been looking for!
> Besides the problem between the Chair and Keyboard I also had big
> problems with the gnu version 3.1.3 for win32.
> I want to thank you, Ed and Kenny. I know I have been a bit annoying.
> - Also, can anyone suggest a good book on the awk lang as I will be
> using it alot this year
I think the on-line GNU text you mentioned elsethread
(http://www.gnu.org/software/gawk/manual/gawk.html) is the best. The
only other one I've looked through is "The AWK Programming Language" by
Aho et al. It's OK as an introduction to awk in general, but obviously
doesn't cover the very useful GNU awk extensions. I get most of my awk
education in this NG and at comp.unix.shell.
Glad to hear things are working for you now.
Ed.
> Gary / Rookie Card / Annoying Lame Newbie
>
|
|
0
|
|
|
|
Reply
|
Ed
|
1/6/2005 10:36:11 PM
|
|
|
32 Replies
365 Views
(page loaded in 0.23 seconds)
Similiar Articles: Fixed length records containing 2 different records types with ...Fixed length records containing 2 different records types with fixed field widths Follow Trim right blanks in each fixed-length field - comp.lang.awk ...Fixed length records containing 2 different records types with ... Right? ... produces an empty field in the file -- or you could say 2 tabs. ... Fixed length records ... text field length - comp.databases.filemakerFixed length records containing 2 different records types with ... Converting Fixed-Width Text Records to XML Fortunately, converting fixed-field length text files ... History logs for field or record changes? - comp.databases ...Fixed length records containing 2 different records types with ... History logs for field or record changes? - comp.databases ... Fixed length records containing 2 ... converting LPs to CD - comp.sys.mac.appsFixed length records containing 2 different records types with ... converting LPs to CD - comp.sys.mac.apps ... How to query when a text field is too big to convert ... Newbie question: How do I remove double spaces from a field ...Two questions: Add an amount to a field + time ... Replace space with dash ... Fixed length records containing 2 different records types with ... William - The files don't ... awk behavior with tab separated file - comp.lang.awkFixed length records containing 2 different records types with ... awk ... an empty field in the file -- or you could say 2 tabs. ... Fixed length records containing 2 ... Problem With Multiple Field Separators - comp.lang.awkI thought a problem might be the field separators and the ... With Multiple Field Separators - comp.lang.awk Is ... Fixed length records containing 2 different records types ... Script to check files in adirectory older than X hours - comp.unix ...Fixed length records containing 2 different records types with ..... amoung many ... For a fixed-length ... field file often uses less space than a fixed ... Difference beetwen awk and gawk? - comp.lang.awkDifference beetwen awk and gawk? - comp.lang.awk Problem With Multiple Field Separators - comp.lang.awk Is ... Fixed length records containing 2 different records types ... how to transpose large matrix? - comp.unix.shellSay, are they of fixed length? Could they be padded to a ... to follow up a bit on that, still assuming fixed widths ... advocate -- it would be "somewhat" (may be a record ... Spaces / Tabs in text - comp.databases.filemakerI have made a small global field containing one tab character ... by dragging them left or right and change the type ... Read Text File (txt, csv, log, tab, fixed length ... EMSDIST in TACL script. - comp.sys.tandem... PASS TANDEM PATHWAY * This is a fixed ... records, but whether it uses the full record length, I ... matching two files of different length - comp.soft-sys.matlab ... Removing duplicates from within sections of a file - comp.lang.awk ...Various minor bugs fixed. See the ... says it only affects regex-based field splitting and record ... 2. Having more than 4 different values for OFMT and/or CONVFMT ... Where did Fortran go? - comp.lang.fortranThe problem was eventually fixed but the ... Just for the record, I shall support "robin" here. ... of f90, you might think that they are two completely different ... Fixed length records containing 2 different records types with ...Fixed length records containing 2 different records types with fixed field widths Follow Fixed length records containing 2 different records types with ...programming.itags.org: Unix & Linux question: Fixed length records containing 2 different records types with fixed field widths, created at:Wed, 30 Apr 2008 11:59:00 ... 7/20/2012 5:52:17 PM
|