Multiple printf's: Not printing properly on same line.

  • Follow


HI all,
I'm a bit of an awk newbie. I'm trying to use some conditional
statements to generate printf statements to print colums on a page. As
I read the printf info, it said it didn't do a linefeed till you
explicitly put one in '\n'.

Here's an example of what I'm trying to do on an input of email
addresses. I'm trying in this section to split off the username before
the @ sign. And if there are underscores (_) in the username, I find
out if they are first and middle initials and a last name...and I want
to print out in columns the email address, the first name or first
initial, the middle initial or last name, and the last name if there
are a first name/first initial and middle initial.

I'm trying to do:

BEGIN {FS="~";}
{

v_email = ""
v_name = ""
v_first_name = ""
v_first_initial = ""
v_middle_initial = ""
v_last_name = ""

  if ($10 !="" && /@/){
    v_email = $10}
  else if ($11!="" && /@/){
    v_email = $11}
  else if ($12 !="" && /@/){
    v_email = $12}


split(v_email,v_email_parts,"@")

v_name = v_email_parts[1]

#First, test for under score splitter

  if (match(v_name,/_/)){

split(v_name,v_name_parts,"_")

 printf("%s\t",v_email)

if (length(v_name_parts[1]) == 1){
  v_first_initial = v_name_parts[1]
  printf("%s\t",v_first_initial)
  }
else {
  v_first_name = v_name_parts[1]
  printf("%s\t",v_first_name)
  }

if (length(v_name_parts[2]) == 1){
  v_middle_initial = v_name_parts[2]
  printf("%s\t",v_middle_initial)
  }
else {
  v_last_name = v_name_parts[2]
 printf("%s\t",v_last_name)
 }

if (length(v_name_parts[3]) > 0){
  v_last_name = v_name_parts[3]
  printf("%s\t",v_last_name)
  }
printf("\n")

  }
}

The file picks off the email from one of three colums..and it does
this perfectly. So, lets say the input in the v_email section is like

f_d_flinstone@bedrock.com
mick_jagger@stones.com
john_d_doe@dead.zone

I'd expect the out put to be

f_d_flinstone@bedrock.com     f    d    flintstone
mick_jagger@stones.com        mick jagger
john_d_doe@dead.zone          john  d   doe

But, this isn't the case...I get something like:
f_d_flind@bedrdck.cflintstone

This isn't a real example, since I don't want to publish real email
addresses here. But, it appears to be overwriting the first entry
(v_email) instead of tabbing over across the page till it hits the \n.

If I comment out each printf statement except for one, they all work
individually..just blows when run all together. Can someone give me a
hint as to what's going wrong...or links to good info on this? I can't
find any good examples on the newsgroups or books so far on this.

This is just part of a program I'm writing, I'll be parsing for all
kinds of things in the name, but, this is the first section I'm
tackling.

Thanks in advance!!

Chilecayenne
0
Reply chilecayenne (29) 9/1/2004 9:25:53 PM

On 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com
(cayenne) wrote:

>HI all,
>I'm a bit of an awk newbie. I'm trying to use some conditional
>statements to generate printf statements to print colums on a page. As
>I read the printf info, it said it didn't do a linefeed till you
>explicitly put one in '\n'.

Correct.

>Here's an example of what I'm trying to do on an input of email
>addresses. I'm trying in this section to split off the username before
>the @ sign. And if there are underscores (_) in the username, I find
>out if they are first and middle initials and a last name...and I want
>to print out in columns the email address, the first name or first
>initial, the middle initial or last name, and the last name if there
>are a first name/first initial and middle initial.

....

>The file picks off the email from one of three colums..and it does
>this perfectly. So, lets say the input in the v_email section is like
>
>f_d_flinstone@bedrock.com
>mick_jagger@stones.com
>john_d_doe@dead.zone
>
>I'd expect the out put to be
>
>f_d_flinstone@bedrock.com     f    d    flintstone
>mick_jagger@stones.com        mick jagger
>john_d_doe@dead.zone          john  d   doe

Exactly what gawk produces. 

>But, this isn't the case...I get something like:
>f_d_flind@bedrdck.cflintstone

Looks like your version of awk may have a problem. Need more info.
What command are you using to run awk? Which awk and version are you
using, under which shell and version, and OS and version? 

-- 
Thanks. Take care, Brian Inglis 	Calgary, Alberta, Canada

Brian.Inglis@CSi.com 	(Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
    fake address		use address above to reply
0
Reply Brian 9/1/2004 10:00:54 PM



cayenne wrote:
> HI all,
> I'm a bit of an awk newbie. I'm trying to use some conditional
> statements to generate printf statements to print colums on a page. As
> I read the printf info, it said it didn't do a linefeed till you
> explicitly put one in '\n'.
> 
> Here's an example of what I'm trying to do on an input of email
> addresses. I'm trying in this section to split off the username before
> the @ sign. And if there are underscores (_) in the username, I find
> out if they are first and middle initials and a last name...and I want
> to print out in columns the email address, the first name or first
> initial, the middle initial or last name, and the last name if there
> are a first name/first initial and middle initial.
> 
> I'm trying to do:
> 
> BEGIN {FS="~";}
> {
> 
> v_email = ""
> v_name = ""
> v_first_name = ""
> v_first_initial = ""
> v_middle_initial = ""
> v_last_name = ""
> 
>   if ($10 !="" && /@/){
>     v_email = $10}
>   else if ($11!="" && /@/){
>     v_email = $11}
>   else if ($12 !="" && /@/){
>     v_email = $12}
> 
> 
> split(v_email,v_email_parts,"@")
> 
> v_name = v_email_parts[1]
> 
> #First, test for under score splitter
> 
>   if (match(v_name,/_/)){
> 
> split(v_name,v_name_parts,"_")
> 
>  printf("%s\t",v_email)
> 
> if (length(v_name_parts[1]) == 1){
>   v_first_initial = v_name_parts[1]
>   printf("%s\t",v_first_initial)
>   }
> else {
>   v_first_name = v_name_parts[1]
>   printf("%s\t",v_first_name)
>   }
> 
> if (length(v_name_parts[2]) == 1){
>   v_middle_initial = v_name_parts[2]
>   printf("%s\t",v_middle_initial)
>   }
> else {
>   v_last_name = v_name_parts[2]
>  printf("%s\t",v_last_name)
>  }
> 
> if (length(v_name_parts[3]) > 0){
>   v_last_name = v_name_parts[3]
>   printf("%s\t",v_last_name)
>   }
> printf("\n")
> 
>   }
> }
> 
> The file picks off the email from one of three colums..and it does
> this perfectly. So, lets say the input in the v_email section is like
> 
> f_d_flinstone@bedrock.com
> mick_jagger@stones.com
> john_d_doe@dead.zone
> 
> I'd expect the out put to be
> 
> f_d_flinstone@bedrock.com     f    d    flintstone
> mick_jagger@stones.com        mick jagger
> john_d_doe@dead.zone          john  d   doe
> 
> But, this isn't the case...I get something like:
> f_d_flind@bedrdck.cflintstone

The above code shouldn't produce that given the input you show Have you 
tried getting rid of some of the printfs to narrow it down to exactly 
which printf(s) cause the problem?

Two possibilites are that your actual input file either:

a) contains control characters which could cause the output to look 
jumbled, or
b) contains empty lines or others which don't contain a "@" in which 
case your initial tests for setting v_email would fail and you fall into 
the "split" with v_email set to "" and I don't know what would happen 
with the resultant invalid array accesses you do after that.

For "a", which I think is the most likely problem, you just need to 
check your input. For "b", you should really structure your code as:

BEGIN{ ... }
/@/ { ... }

rather than just:

BEGIN{ ... }
{ ... }

to make sure you're only processing lines with an "@" symbol (presumably 
email addresses).

An unrelated enhancement you might want to consider is to change this:

	  if (match(v_name,/_/)){

	split(v_name,v_name_parts,"_")

to this:
	
	num_parts = split(v_name,v_name_parts,"_")

	if (num_parts > 1){

i.e. just check the value returned from split to see if there as an "_" 
rather than having to call a separate "match" function first.

You also don't need to test for whether the last name is in the 2nd or 
3rd piosition becayuse you can just do:

	v_last_name = v_name_parts[num_parts]

You should probably revisit the way you're assigning v_last_name anyway 
since your current method would, given an input address of 
"jim_bob_jones@whatever.com", set the first name to "jim" and the last 
name to "jones" but completely ignore the "bob" (actually it would save 
that as the last name then over-write it).

Hope that helps,

	Ed.

0
Reply Ed 9/2/2004 1:54:55 PM

Brian Inglis <Brian.Inglis@SystematicSW.Invalid> wrote in message news:<behcj0tujq3v7f6gqr062m585833228prv@4ax.com>...
> On 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com
> (cayenne) wrote:
> 
> >HI all,
> >I'm a bit of an awk newbie. I'm trying to use some conditional
> >statements to generate printf statements to print colums on a page. As
> >I read the printf info, it said it didn't do a linefeed till you
> >explicitly put one in '\n'.
> 
> Correct.
> 
> >Here's an example of what I'm trying to do on an input of email
> >addresses. I'm trying in this section to split off the username before
> >the @ sign. And if there are underscores (_) in the username, I find
> >out if they are first and middle initials and a last name...and I want
> >to print out in columns the email address, the first name or first
> >initial, the middle initial or last name, and the last name if there
> >are a first name/first initial and middle initial.
> 
> ...
> 
> >The file picks off the email from one of three colums..and it does
> >this perfectly. So, lets say the input in the v_email section is like
> >
> >f_d_flinstone@bedrock.com
> >mick_jagger@stones.com
> >john_d_doe@dead.zone
> >
> >I'd expect the out put to be
> >
> >f_d_flinstone@bedrock.com     f    d    flintstone
> >mick_jagger@stones.com        mick jagger
> >john_d_doe@dead.zone          john  d   doe
> 
> Exactly what gawk produces. 
> 
> >But, this isn't the case...I get something like:
> >f_d_flind@bedrdck.cflintstone
> 
> Looks like your version of awk may have a problem. Need more info.
> What command are you using to run awk? Which awk and version are you
> using, under which shell and version, and OS and version?


Hi Brian, thank you very much for your reply!! I'm running Gentoo
Linux, with the gentoo sources kernel: linux-2.4.20-gentoo-r5.

awk --version gives me:
GNU Awk 3.1.3
Copyright (C) 1989, 1991-2003 Free Software Foundation.

I'm a little new to the differences with awk, gawk, and nawk...but,
just to check things a little further I did a look in /bin to find
that awk is a link to gawk on my system:

/bin/awk -> gawk-3.1.3

I'm using the following to run my script:
cat white_pages.csv | awk -f phone1.awk | more

white_pages.csv is my file I'm picking off the email addresses of, and
phone1.awk is my script file. I'm just using more to scroll down the
results to look at them onscreen for now.

Thanks for any insight and suggestions you can help me with! I really
like working with awk so far...but, is easy to stumble as you progress
to slighly more complex things.

CC
0
Reply chilecayenne 9/2/2004 2:28:26 PM


cayenne wrote:
<snip>
> I'm using the following to run my script:
> cat white_pages.csv | awk -f phone1.awk | more

This is commonly called "UUOC" (Useless Use Of Cat) since awk can take a 
file name argument. Do this instead:

awk -f phone1.awk white_pages.csv | more

Regards,

	Ed.

0
Reply Ed 9/2/2004 3:06:26 PM

In article <DcydnafNxcKtu6rcRVn-jA@comcast.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>cayenne wrote:
>> HI all,
>> I'm a bit of an awk newbie. I'm trying to use some conditional
>> statements to generate printf statements to print colums on a page. As
>> I read the printf info, it said it didn't do a linefeed till you
>> explicitly put one in '\n'.
>> 
>> Here's an example of what I'm trying to do on an input of email
>> addresses. I'm trying in this section to split off the username before
>> the @ sign. And if there are underscores (_) in the username, I find
>> out if they are first and middle initials and a last name...and I want
>> to print out in columns the email address, the first name or first
>> initial, the middle initial or last name, and the last name if there
>> are a first name/first initial and middle initial.
>> 
>> I'm trying to do:
>> 
>> BEGIN {FS="~";}
>> {

Lots of program snipped.

>>  printf("%s\t",v_email)

At this point, what if "v_email" is terminated with \r\n instead of
the unix-style \n.

It gets printed, and the invisible writing cursor is moved back
to the beginning of the line.

>> The file picks off the email from one of three colums..and it does
>> this perfectly. So, lets say the input in the v_email section is like
>> 
>> f_d_flinstone@bedrock.com
>> mick_jagger@stones.com
>> john_d_doe@dead.zone
>> 
>> I'd expect the out put to be
>> 
>> f_d_flinstone@bedrock.com     f    d    flintstone
>> mick_jagger@stones.com        mick jagger
>> john_d_doe@dead.zone          john  d   doe
>> 
>> But, this isn't the case...I get something like:
>> f_d_flind@bedrdck.cflintstone

To be expected it there is a "carriage return" \r character
just after the ".com" in the first part of the output.

To be expected it there is a "carriage return" \r character
just after the ".com" in the first part of the output.

    carl
-- 
    carl lowenstein         marine physical lab     u.c. san diego
                                                 clowenst@ucsd.edu
0
Reply cdl 9/2/2004 6:11:53 PM

Ed Morton <morton@lsupcaemnt.com> wrote in message news:<DcydnafNxcKtu6rcRVn-jA@comcast.com>...
<snip>
> > But, this isn't the case...I get something like:
> > f_d_flind@bedrdck.cflintstone
> 
> The above code shouldn't produce that given the input you show Have you 
> tried getting rid of some of the printfs to narrow it down to exactly 
> which printf(s) cause the problem?
> 
> Two possibilites are that your actual input file either:
> 
> a) contains control characters which could cause the output to look 
> jumbled, or
> b) contains empty lines or others which don't contain a "@" in which 
> case your initial tests for setting v_email would fail and you fall into 
> the "split" with v_email set to "" and I don't know what would happen 
> with the resultant invalid array accesses you do after that.
> 
> For "a", which I think is the most likely problem, you just need to 
> check your input. For "b", you should really structure your code as:
> 
> BEGIN{ ... }
> /@/ { ... }
> 
> rather than just:
> 
> BEGIN{ ... }
> { ... }
> 
> to make sure you're only processing lines with an "@" symbol (presumably 
> email addresses).
> 
> An unrelated enhancement you might want to consider is to change this:
> 
> 	  if (match(v_name,/_/)){
> 
> 	split(v_name,v_name_parts,"_")
> 
> to this:
> 	
> 	num_parts = split(v_name,v_name_parts,"_")
> 
> 	if (num_parts > 1){
> 
> i.e. just check the value returned from split to see if there as an "_" 
> rather than having to call a separate "match" function first.
> 
> You also don't need to test for whether the last name is in the 2nd or 
> 3rd piosition becayuse you can just do:
> 
> 	v_last_name = v_name_parts[num_parts]
> 
> You should probably revisit the way you're assigning v_last_name anyway 
> since your current method would, given an input address of 
> "jim_bob_jones@whatever.com", set the first name to "jim" and the last 
> name to "jones" but completely ignore the "bob" (actually it would save 
> that as the last name then over-write it).
> 
> Hope that helps,
> 
> 	Ed.

Thanks for the reply Ed.
I've gone through and commented out all but one the printf's...each
one by themselves works just fine.

Yeah, I know I need to clean up the code, and had thought about the
middle name vs. middle intital...this is just a first run through as I
started to refine it...and got stuck with the printing problem at this
early of an stage.

This file is a csv from MS excell. I'll try to check for special
characters...maybe run a dos2unix on it...But, like I said, if I just
do one printf, it works...each one individually works...but, if I
start to use 2 or more of them to spit things out in columns, it mixes
them all into one line.

I'll check on the special characters tho...
Any other suggestions greatly appreciated!!
:-)

CC
0
Reply chilecayenne 9/2/2004 7:26:09 PM

In article <2deb3d1.0409021126.2c6446fc@posting.google.com>,
 chilecayenne@yahoo.com (cayenne) wrote:

> Ed Morton <morton@lsupcaemnt.com> wrote in message 
> news:<DcydnafNxcKtu6rcRVn-jA@comcast.com>...
> <snip>
> > > But, this isn't the case...I get something like:
> > > f_d_flind@bedrdck.cflintstone
> > 
> > The above code shouldn't produce that given the input you show Have you 
> > tried getting rid of some of the printfs to narrow it down to exactly 
> > which printf(s) cause the problem?
> > 
> > Two possibilites are that your actual input file either:
> > 
> > a) contains control characters which could cause the output to look 
> > jumbled, or
> > b) contains empty lines or others which don't contain a "@" in which 
> > case your initial tests for setting v_email would fail and you fall into 
> > the "split" with v_email set to "" and I don't know what would happen 
> > with the resultant invalid array accesses you do after that.
> > 
> > For "a", which I think is the most likely problem, you just need to 
> > check your input. For "b", you should really structure your code as:
> > 
> > BEGIN{ ... }
> > /@/ { ... }
> > 
> > rather than just:
> > 
> > BEGIN{ ... }
> > { ... }
> > 
> > to make sure you're only processing lines with an "@" symbol (presumably 
> > email addresses).
> > 
> > An unrelated enhancement you might want to consider is to change this:
> > 
> > 	  if (match(v_name,/_/)){
> > 
> > 	split(v_name,v_name_parts,"_")
> > 
> > to this:
> > 	
> > 	num_parts = split(v_name,v_name_parts,"_")
> > 
> > 	if (num_parts > 1){
> > 
> > i.e. just check the value returned from split to see if there as an "_" 
> > rather than having to call a separate "match" function first.
> > 
> > You also don't need to test for whether the last name is in the 2nd or 
> > 3rd piosition becayuse you can just do:
> > 
> > 	v_last_name = v_name_parts[num_parts]
> > 
> > You should probably revisit the way you're assigning v_last_name anyway 
> > since your current method would, given an input address of 
> > "jim_bob_jones@whatever.com", set the first name to "jim" and the last 
> > name to "jones" but completely ignore the "bob" (actually it would save 
> > that as the last name then over-write it).
> > 
> > Hope that helps,
> > 
> > 	Ed.
> 
> Thanks for the reply Ed.
> I've gone through and commented out all but one the printf's...each
> one by themselves works just fine.
> 
> Yeah, I know I need to clean up the code, and had thought about the
> middle name vs. middle intital...this is just a first run through as I
> started to refine it...and got stuck with the printing problem at this
> early of an stage.
> 
> This file is a csv from MS excell. I'll try to check for special
> characters...maybe run a dos2unix on it...But, like I said, if I just
> do one printf, it works...each one individually works...but, if I
> start to use 2 or more of them to spit things out in columns, it mixes
> them all into one line.
> 
> I'll check on the special characters tho...
> Any other suggestions greatly appreciated!!
> :-)
> 
> CC

Change the command line to

  awk -f phone1.awk white_pages.csv | cat -vte | more

The 'cat -vte' will tell you if there are any invisible characters and 
especially if there are <CR><LF> pairs by displaying ^M for the <CR> 
values and $ for the <LF>

                                        Bob Harris
0
Reply Bob 9/2/2004 8:40:34 PM

Bob Harris <harris@zk3.dec.com> wrote in message news:<harris-DF6E8A.16402002092004@cacnews.cac.cpqcorp.net>...

> 
> Change the command line to
> 
>   awk -f phone1.awk white_pages.csv | cat -vte | more
> 
> The 'cat -vte' will tell you if there are any invisible characters and 
> especially if there are <CR><LF> pairs by displaying ^M for the <CR> 
> values and $ for the <LF>
> 
>                                         Bob Harris

Bob and all the other great responders to my thread.
THANK YOU!!

It was indeed the ^M$ that was the problem. Have I mentioned lately
how much I HATE MS Windoze? Arrrgh. It is hard enough learning
something new without having to deal with the hidden crap MS puts into
what should be a plain text file...

Anyway, again thanks for the help on this, the syntax corrections, and
the new trick with cat -vte. I'd never done a man cat before...found
it does more than I'd thought.

Thanx,

CC

ps. Just curious, I've seen mentioned in responses here and other
forums where people get irritated about using cat 'too much'. Just
curious as to why?
0
Reply chilecayenne 9/3/2004 2:42:19 PM

In article <2deb3d1.0409030642.1a10d901@posting.google.com>,
cayenne <chilecayenne@yahoo.com> wrote:
....
>ps. Just curious, I've seen mentioned in responses here and other
>forums where people get irritated about using cat 'too much'. Just
>curious as to why?

For some reason, newbies often write:

	cat somefile | someutil ...

and this is unnecessary, and, in theory at least, wasteful.  I won't go
into the various details as to why it is unnecessary and wrong (STFW), but
I will take the opportunity to say that, many, many moons ago, I saw the
following in an MSDOS manual:

	type file | more

and so, as with most things that are wrong in computing, it is all Bill
Gates's fault.  Note that the above is particularly bad in DOS, which
doesn't have any sort of multitasking and thus has only fake pipes.

0
Reply gazelle 9/3/2004 2:52:19 PM


cayenne wrote:
<snip>
> ps. Just curious, I've seen mentioned in responses here and other
> forums where people get irritated about using cat 'too much'. Just
> curious as to why?

It's not so much about getting irritated, it's more helping people learn 
when they don't need to use it. The common mistake newcomers make is to use:

	cat file | some_command

when "some_command" can take a file argument and so the above could be 
written as:

	some_command file

or it could just read redirected input as:

	some_command < file

and save an external command (cat) and a pipe.

Let's say you have a kid who puts on their shoes, then takes them off 
and puts on their socks then puts their shoes back on. Wouldn't you tell 
them that they don't actually need to put on their shoes the first time? 
   After seeing several kids do this, wouldn't you get a tad irritated 
and wonder who the heck is out there telling kids that that's the right 
way to do things? It's kinda like that....

	Ed.

0
Reply Ed 9/3/2004 2:56:10 PM

Kenny McCormack wrote:

>>ps. Just curious, I've seen mentioned in responses here and other
>>forums where people get irritated about using cat 'too much'. Just
>>curious as to why?
> 
> 
> For some reason, newbies often write:
> 
> 	cat somefile | someutil ...


I often do (although I'm not a newbie), since I think in terms of 
pipelines and this way it goes nicely from left to right.

> and this is unnecessary, and, in theory at least, wasteful.  I won't go
> into the various details as to why it is unnecessary and wrong (STFW), but

On occasion, it can be better. Copying a large file from one disk to 
another is often better achieved with:

cat file1 | cat > file2

(or better still dd), since you're reading and writing in parallel like 
this.

-Ed





-- 
(You can't go wrong with psycho-rats.)       (er258)(@)(eng.cam)(.ac.uk)

/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage

0
Reply E 9/3/2004 5:44:48 PM


E. Rosten wrote:
> Kenny McCormack wrote:
> 
>>> ps. Just curious, I've seen mentioned in responses here and other
>>> forums where people get irritated about using cat 'too much'. Just
>>> curious as to why?
>>
>>
>>
>> For some reason, newbies often write:
>>
>>     cat somefile | someutil ...
> 
> 
> 
> I often do (although I'm not a newbie), since I think in terms of 
> pipelines and this way it goes nicely from left to right.

Then presumably writing it as:

	cat somefile | cat | someutil

is even better since it extends even further from left to right ;-). 
It's fine to think in terms of pipelines, but I can't imagine why you'd 
want to introduce commands gratuitously at the head or tail of a pipeine.

>> and this is unnecessary, and, in theory at least, wasteful.  I won't go
>> into the various details as to why it is unnecessary and wrong (STFW), 
>> but
> 
> 
> On occasion, it can be better. Copying a large file from one disk to 
> another is often better achieved with:
> 
> cat file1 | cat > file2
> 
> (or better still dd), since you're reading and writing in parallel like 
> this.

I've never come across that situation. When you say it's better - do you 
mean faster, or more reliable, or something else? Can you go into any 
more detail on why it's better as the benefits aren't intuitively obvious.

	Ed.
0
Reply Ed 9/3/2004 5:52:59 PM

In article <q5idnS1ImonjMqXcRVn-rQ@comcast.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>E. Rosten wrote:
>> Kenny McCormack wrote:
>> 
>>>> ps. Just curious, I've seen mentioned in responses here and other
>>>> forums where people get irritated about using cat 'too much'. Just
>>>> curious as to why?
>>>
>>>
>>>
>>> For some reason, newbies often write:
>>>
>>>     cat somefile | someutil ...
>> 
>> 
>> 
>> I often do (although I'm not a newbie), since I think in terms of 
>> pipelines and this way it goes nicely from left to right.
>
>Then presumably writing it as:
>
>	cat somefile | cat | someutil
>
>is even better since it extends even further from left to right ;-). 
>It's fine to think in terms of pipelines, but I can't imagine why you'd 
>want to introduce commands gratuitously at the head or tail of a pipeine.

Indeed.  The standard Randy Schwartz answer to "but I like to see my data
go from left to right" is:

	< file someutil

which works in any Bourne-ish shell.

>>> and this is unnecessary, and, in theory at least, wasteful.  I won't go
>>> into the various details as to why it is unnecessary and wrong (STFW), 
>>> but
>> 
>> 
>> On occasion, it can be better. Copying a large file from one disk to 
>> another is often better achieved with:
>> 
>> cat file1 | cat > file2
>> 
>> (or better still dd), since you're reading and writing in parallel like 
>> this.
>
>I've never come across that situation. When you say it's better - do you 
>mean faster, or more reliable, or something else? Can you go into any 
>more detail on why it's better as the benefits aren't intuitively obvious.

I don't the claim holds water in any general sense.  It *might* be true on
some particular piece of hardware under some particular set of conditions.

0
Reply gazelle 9/3/2004 6:56:48 PM

On Fri, 03 Sep 2004 12:52:59 -0500 in comp.lang.awk, Ed Morton
<morton@lsupcaemnt.com> wrote:

>
>
>E. Rosten wrote:
>> Kenny McCormack wrote:
>> 
>>>> ps. Just curious, I've seen mentioned in responses here and other
>>>> forums where people get irritated about using cat 'too much'. Just
>>>> curious as to why?

>> On occasion, it can be better. Copying a large file from one disk to 
>> another is often better achieved with:
>> 
>> cat file1 | cat > file2
>> 
>> (or better still dd), since you're reading and writing in parallel like 
>> this.
>
>I've never come across that situation. When you say it's better - do you 
>mean faster, or more reliable, or something else? Can you go into any 
>more detail on why it's better as the benefits aren't intuitively obvious.

I can't see that ever being better than cp [-p] file1 file2, as
mmap() is often used to have the OS do most of the work, and cp
normally handles sparse files, whereas cat can't. 

To make local and remote file and directory copies retaining dates and
permissions, tar idioms are often used:

	cd fdir ; tar cf - file | ( cd tdir; tar xf - )
	cd fdir ; tar cf - . | ( cd tdir; tar xf - )
	rsh host '( cd fdir ; tar cf - . )' | ( cd tdir; tar xf - )
	cd fdir ; tar cf - . | rsh host '( cd tdir; tar xf - )'

I wonder if newbies pick this up, can't remember the tar details and
substitute cat instead? 

-- 
Thanks. Take care, Brian Inglis 	Calgary, Alberta, Canada

Brian.Inglis@CSi.com 	(Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
    fake address		use address above to reply
0
Reply Brian 9/3/2004 10:07:13 PM

>> I often do (although I'm not a newbie), since I think in terms of 
>> pipelines and this way it goes nicely from left to right.
> 
> 
> Then presumably writing it as:
> 
>     cat somefile | cat | someutil
> 
> is even better since it extends even further from left to right ;-). 
> It's fine to think in terms of pipelines, but I can't imagine why you'd 
> want to introduce commands gratuitously at the head or tail of a pipeine.


I just find doing it that way more logical. If I go to the effort of 
writing a shell script (ie doing the task more than once), I'll do it 
properly, but when I'm doing a one-liner, my brain taps out cat file | 
before I've had a chance to stop it :-)

It is slightly wastful, but not enough so to makebreaking the habit 
succesful.

>> On occasion, it can be better. Copying a large file from one disk to 
>> another is often better achieved with:
>>
>> cat file1 | cat > file2
>>
>> (or better still dd), since you're reading and writing in parallel 
>> like this.
> 
> 
> I've never come across that situation. When you say it's better - do you 
> mean faster, or more reliable, or something else? 

Faster.

> Can you go into any 
> more detail on why it's better as the benefits aren't intuitively obvious.

cp seems to work by copying a bunch of data from one disk then writing 
it to the other disk. At least that's what my observations of the noises 
made by the disks lead me to believe.


with cat file | cat > file

The second cat empties the data from the pipe (fast) and writes it to 
disk. As it is writing, the first cat sees that the pipeline is empty, 
so it can go and read some data from the disk in order to fill that 
pipeline. In this way, the write to the disk by the second cat happens 
simultaneously with the read made by the first cat.

This only speeds up things if the files are on different physical disks, 
or even better, on different physical channels.

But it really does work, try creating a very large file and copying it.

-Ed


-- 
(You can't go wrong with psycho-rats.)       (er258)(@)(eng.cam)(.ac.uk)

/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage

0
Reply E 9/6/2004 12:37:50 PM

15 Replies
165 Views

(page loaded in 0.183 seconds)

Similiar Articles:


















7/30/2012 4:43:16 AM


Reply: