Truncate only first and last lines of a test file.

  • Follow


I would appreciate any pointers on how to do this.

I need to truncate the first line of a file to 10 chatacters for
example, and the last to 20 characters.  The lines between are OK, and
are the correct length, but may be padded with spaces to that leength
as per the interface spec'.

This has arrisen due to the way a old system that should have been
smashed with a big hammer produces data. ie all files are the same
length.  As the MPE gurus (that's how they describe them selves - it
says it all really) state that that's the way it is! So it's down to
me to sort out the corrupted headers and footers on the Unix side,
before transmission to a client.

Any help, or a really big hammer in the post would be appreciated.

Rob.B
0
Reply rob.bradford (3) 4/26/2010 10:28:36 AM

Rob B wrote:

> I would appreciate any pointers on how to do this.
> 
> I need to truncate the first line of a file to 10 chatacters for
> example, and the last to 20 characters.  The lines between are OK, and
> are the correct length, but may be padded with spaces to that leength
> as per the interface spec'.
> 
> This has arrisen due to the way a old system that should have been
> smashed with a big hammer produces data. ie all files are the same
> length.  As the MPE gurus (that's how they describe them selves - it
> says it all really) state that that's the way it is! So it's down to
> me to sort out the corrupted headers and footers on the Unix side,
> before transmission to a client.
> 
> Any help, or a really big hammer in the post would be appreciated.

If I understand you correctly, you need to turn for example

longlonglonglonglonglonglongline
abc
....many lines here...
xxx
anotherlonglonglonglonglonglonglongline

into


longlonglo
abc
....many lines here...
xxx
anotherlonglonglongl

If that's correct, try this:

awk 'p{print p} {p = NR>1?$0:substr($0,1,10)}END{print substr(p,1,20)}' file

0
Reply pk 4/26/2010 10:41:35 AM


On 26 Apr, 11:41, pk <p...@pk.invalid> wrote:
> Rob B wrote:
> > I would appreciate any pointers on how to do this.
>
> > I need to truncate the first line of a file to 10 chatacters for
> > example, and the last to 20 characters. =A0The lines between are OK, an=
d
> > are the correct length, but may be padded with spaces to that leength
> > as per the interface spec'.
>
> > This has arrisen due to the way a old system that should have been
> > smashed with a big hammer produces data. ie all files are the same
> > length. =A0As the MPE gurus (that's how they describe them selves - it
> > says it all really) state that that's the way it is! So it's down to
> > me to sort out the corrupted headers and footers on the Unix side,
> > before transmission to a client.
>
> > Any help, or a really big hammer in the post would be appreciated.
>
> If I understand you correctly, you need to turn for example
>
> longlonglonglonglonglonglongline
> abc
> ...many lines here...
> xxx
> anotherlonglonglonglonglonglonglongline
>
> into
>
> longlonglo
> abc
> ...many lines here...
> xxx
> anotherlonglonglongl
>
> If that's correct, try this:
>
> awk 'p{print p} {p =3D NR>1?$0:substr($0,1,10)}END{print substr(p,1,20)}'=
 file- Hide quoted text -
>
> - Show quoted text -

Thanks for the line of script, that is what I wanted. I could do
either the first orlast line, but couldn't string them together.

I'm now off to smash that HP3000!!!!!!!!!!!!!!

Rob.
0
Reply Rob 4/26/2010 12:16:40 PM

On Apr 26, 5:41=A0am, pk <p...@pk.invalid> wrote:
> Rob B wrote:
> > I would appreciate any pointers on how to do this.
>
> > I need to truncate the first line of a file to 10 chatacters for
> > example, and the last to 20 characters. =A0The lines between are OK, an=
d
> > are the correct length, but may be padded with spaces to that leength
> > as per the interface spec'.
>
> > This has arrisen due to the way a old system that should have been
> > smashed with a big hammer produces data. ie all files are the same
> > length. =A0As the MPE gurus (that's how they describe them selves - it
> > says it all really) state that that's the way it is! So it's down to
> > me to sort out the corrupted headers and footers on the Unix side,
> > before transmission to a client.
>
> > Any help, or a really big hammer in the post would be appreciated.
>
> If I understand you correctly, you need to turn for example
>
> longlonglonglonglonglonglongline
> abc
> ...many lines here...
> xxx
> anotherlonglonglonglonglonglonglongline
>
> into
>
> longlonglo
> abc
> ...many lines here...
> xxx
> anotherlonglonglongl
>
> If that's correct, try this:
>
> awk 'p{print p} {p =3D NR>1?$0:substr($0,1,10)}END{print substr(p,1,20)}'=
 file- Hide quoted text -
>
> - Show quoted text -

I'd make that first test be for "NR>1{...}" rather than "p{...}" so
the script doesn't discard lines that are either all-blanks or zero.

   Ed.
0
Reply Ed 4/26/2010 4:57:19 PM

Ed Morton wrote:

>> awk 'p{print p} {p = NR>1?$0:substr($0,1,10)}END{print substr(p,1,20)}'
>> file- Hide quoted text -
>>
>> - Show quoted text -
> 
> I'd make that first test be for "NR>1{...}" rather than "p{...}" so
> the script doesn't discard lines that are either all-blanks or zero.

Good catch, thanks.
0
Reply pk 4/26/2010 4:58:56 PM

In article <86f9a1f9-45ab-456d-b7e6-194cd85a8909@g23g2000yqn.googlegroups.com>,
Ed Morton  <mortonspam@gmail.com> wrote:
....
>> awk 'p{print p} {p = NR>1?$0:substr($0,1,10)}END{print
>substr(p,1,20)}' file- Hide quoted text -
>>
>> - Show quoted text -
>
>I'd make that first test be for "NR>1{...}" rather than "p{...}" so
>the script doesn't discard lines that are either all-blanks or zero.
>
>   Ed.

I wonder if it might be more straightforward, for a beginner, and
assuming the file is not too big (these days, that means less than 10Gb),
to just do (and yes, this is OT, since I am giving a shell command, not
an AWK script):

gawk 'ARGIND == 1 {next} FNR == 1 {nr=NR-1;print substr($0,1,10);next}
    FNR == nr {print substr($0,1,20);next}1' file file

Assumes gawk, of course, but these days, if you're not using (at least)
GAWK, yer outta town!  (Yes, TAWK has this functionality, too, but the
syntax is different; such is life)

The above introduces some interesting concepts (including ARGIND, which
is a very nice functionality to became familiar with), but eliminates
the need for saving the previous line in a variable - a theme which,
although I've used it several times, I've never been all that
comfortable with.  I think it is a lot better if you can avoid it - that
is, always deal with the current line as you are reading it.

-- 
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

0
Reply gazelle 4/26/2010 5:15:58 PM

On Apr 26, 12:15=A0pm, gaze...@shell.xmission.com (Kenny McCormack)
wrote:
> In article <86f9a1f9-45ab-456d-b7e6-194cd85a8...@g23g2000yqn.googlegroups=
..com>,
> Ed Morton =A0<mortons...@gmail.com> wrote:
> ...
>
> >> awk 'p{print p} {p =3D NR>1?$0:substr($0,1,10)}END{print
> >substr(p,1,20)}' file- Hide quoted text -
>
> >> - Show quoted text -
>
> >I'd make that first test be for "NR>1{...}" rather than "p{...}" so
> >the script doesn't discard lines that are either all-blanks or zero.
>
> > =A0 Ed.
>
> I wonder if it might be more straightforward, for a beginner, and
> assuming the file is not too big (these days, that means less than 10Gb),
> to just do (and yes, this is OT, since I am giving a shell command, not
> an AWK script):
>
> gawk 'ARGIND =3D=3D 1 {next} FNR =3D=3D 1 {nr=3DNR-1;print substr($0,1,10=
);next}
> =A0 =A0 FNR =3D=3D nr {print substr($0,1,20);next}1' file file
>
> Assumes gawk, of course, but these days, if you're not using (at least)
> GAWK, yer outta town! =A0(Yes, TAWK has this functionality, too, but the
> syntax is different; such is life)

Or we could go with the oft-used NR=3D=3DFNR and make it non-gawk spceific
since we're assuming non-empty files anyway:

awk '
  NR  =3D=3D FNR { nr++; next}
  FNR =3D=3D 1   { $0 =3D substr($0,1,10) }
  FNR =3D=3D nr  { $0 =3D substr($0,1,20) }
  { print }
' file file

I made a couple of other tweaks, just style things that I think makes
it a bit easier to read if we're catering to a beginner...

  Ed.
0
Reply Ed 4/26/2010 6:50:57 PM

On Apr 26, 1:50=A0pm, Ed Morton <mortons...@gmail.com> wrote:
> On Apr 26, 12:15=A0pm, gaze...@shell.xmission.com (Kenny McCormack)
> wrote:
>
>
>
>
>
> > In article <86f9a1f9-45ab-456d-b7e6-194cd85a8...@g23g2000yqn.googlegrou=
ps.com>,
> > Ed Morton =A0<mortons...@gmail.com> wrote:
> > ...
>
> > >> awk 'p{print p} {p =3D NR>1?$0:substr($0,1,10)}END{print
> > >substr(p,1,20)}' file- Hide quoted text -
>
> > >> - Show quoted text -
>
> > >I'd make that first test be for "NR>1{...}" rather than "p{...}" so
> > >the script doesn't discard lines that are either all-blanks or zero.
>
> > > =A0 Ed.
>
> > I wonder if it might be more straightforward, for a beginner, and
> > assuming the file is not too big (these days, that means less than 10Gb=
),
> > to just do (and yes, this is OT, since I am giving a shell command, not
> > an AWK script):
>
> > gawk 'ARGIND =3D=3D 1 {next} FNR =3D=3D 1 {nr=3DNR-1;print substr($0,1,=
10);next}
> > =A0 =A0 FNR =3D=3D nr {print substr($0,1,20);next}1' file file
>
> > Assumes gawk, of course, but these days, if you're not using (at least)
> > GAWK, yer outta town! =A0(Yes, TAWK has this functionality, too, but th=
e
> > syntax is different; such is life)
>
> Or we could go with the oft-used NR=3D=3DFNR and make it non-gawk spceifi=
c
> since we're assuming non-empty files anyway:
>
> awk '
> =A0 NR =A0=3D=3D FNR { nr++; next}
> =A0 FNR =3D=3D 1 =A0 { $0 =3D substr($0,1,10) }
> =A0 FNR =3D=3D nr =A0{ $0 =3D substr($0,1,20) }
> =A0 { print }
> ' file file
>
> I made a couple of other tweaks, just style things that I think makes
> it a bit easier to read if we're catering to a beginner...
>
> =A0 Ed.- Hide quoted text -
>
> - Show quoted text -

Y'know, this may be one of those rare occasions when a getline loop
would be appropriate to tell us how many records are in the file:

awk '
   BEGIN { while ( (getline dummy < ARGV[1]) > 0 ) nr++ }
   NR =3D=3D 1   { $0 =3D substr($0,1,10) }
   NR =3D=3D nr  { $0 =3D substr($0,1,20) }
   { print }
' file

rather than forcing the user to pass in the file name twice and
muddying the logic of the main body of the script.

     Ed.
0
Reply Ed 4/26/2010 7:46:05 PM

Ed Morton wrote:
> On Apr 26, 1:50 pm, Ed Morton <mortons...@gmail.com> wrote:
>> On Apr 26, 12:15 pm, gaze...@shell.xmission.com (Kenny McCormack)
>> wrote:
>>
>>
>>
>>
>>
>>> In article <86f9a1f9-45ab-456d-b7e6-194cd85a8...@g23g2000yqn.googlegroups.com>,
>>> Ed Morton  <mortons...@gmail.com> wrote:
>>> ...
>>>>> awk 'p{print p} {p = NR>1?$0:substr($0,1,10)}END{print
>>>> substr(p,1,20)}' file- Hide quoted text -
>>>>> - Show quoted text -
>>>> I'd make that first test be for "NR>1{...}" rather than "p{...}" so
>>>> the script doesn't discard lines that are either all-blanks or zero.
>>>>   Ed.
>>> I wonder if it might be more straightforward, for a beginner, and
>>> assuming the file is not too big (these days, that means less than 10Gb),
>>> to just do (and yes, this is OT, since I am giving a shell command, not
>>> an AWK script):
>>> gawk 'ARGIND == 1 {next} FNR == 1 {nr=NR-1;print substr($0,1,10);next}
>>>     FNR == nr {print substr($0,1,20);next}1' file file
>>> Assumes gawk, of course, but these days, if you're not using (at least)
>>> GAWK, yer outta town!  (Yes, TAWK has this functionality, too, but the
>>> syntax is different; such is life)
>> Or we could go with the oft-used NR==FNR and make it non-gawk spceific
>> since we're assuming non-empty files anyway:
>>
>> awk '
>>   NR  == FNR { nr++; next}
>>   FNR == 1   { $0 = substr($0,1,10) }
>>   FNR == nr  { $0 = substr($0,1,20) }
>>   { print }
>> ' file file
>>
>> I made a couple of other tweaks, just style things that I think makes
>> it a bit easier to read if we're catering to a beginner...
>>
>>   Ed.- Hide quoted text -
>>
>> - Show quoted text -
> 
> Y'know, this may be one of those rare occasions when a getline loop
> would be appropriate to tell us how many records are in the file:
> 
> awk '
>    BEGIN { while ( (getline dummy < ARGV[1]) > 0 ) nr++ }
>    NR == 1   { $0 = substr($0,1,10) }
>    NR == nr  { $0 = substr($0,1,20) }
>    { print }
> ' file
> 
> rather than forcing the user to pass in the file name twice and
> muddying the logic of the main body of the script.

There's also the possibility to duplicate the ARGV element in the BEGIN
section as an alternative to providing the file name explicitly twice.

I don't quite like the last version with getline because it's on first
glance not that apparent whether ARGV[1] will be implicitly closed after
reading EOF and re-opened a second time for the main awk loop.

Besides pk's fine one-pass solution, I'd even prefer a separate process
invocation to determine the size

awk -v nr=$( any_wc_or_awk_like_tool  file ) '
  NR == 1   { $0 = substr($0,1,10) }
  NR == nr  { $0 = substr($0,1,20) }
  { print }
' file

where any_wc_or_awk_like_tool is either  awk 'END {print NR}'  or  wc -l .

Janis

> 
>      Ed.
0
Reply Janis 4/26/2010 10:11:55 PM

8 Replies
131 Views

(page loaded in 0.077 seconds)

Similiar Articles:













7/30/2012 4:07:58 AM


Reply: