HI all,
I'm a bit of an awk newbie. I'm trying to use some conditional
statements to generate printf statements to print colums on a page. As
I read the printf info, it said it didn't do a linefeed till you
explicitly put one in '\n'.
Here's an example of what I'm trying to do on an input of email
addresses. I'm trying in this section to split off the username before
the @ sign. And if there are underscores (_) in the username, I find
out if they are first and middle initials and a last name...and I want
to print out in columns the email address, the first name or first
initial, the middle initial or last name, and the last name if there
are a first name/first initial and middle initial.
I'm trying to do:
BEGIN {FS="~";}
{
v_email = ""
v_name = ""
v_first_name = ""
v_first_initial = ""
v_middle_initial = ""
v_last_name = ""
if ($10 !="" && /@/){
v_email = $10}
else if ($11!="" && /@/){
v_email = $11}
else if ($12 !="" && /@/){
v_email = $12}
split(v_email,v_email_parts,"@")
v_name = v_email_parts[1]
#First, test for under score splitter
if (match(v_name,/_/)){
split(v_name,v_name_parts,"_")
printf("%s\t",v_email)
if (length(v_name_parts[1]) == 1){
v_first_initial = v_name_parts[1]
printf("%s\t",v_first_initial)
}
else {
v_first_name = v_name_parts[1]
printf("%s\t",v_first_name)
}
if (length(v_name_parts[2]) == 1){
v_middle_initial = v_name_parts[2]
printf("%s\t",v_middle_initial)
}
else {
v_last_name = v_name_parts[2]
printf("%s\t",v_last_name)
}
if (length(v_name_parts[3]) > 0){
v_last_name = v_name_parts[3]
printf("%s\t",v_last_name)
}
printf("\n")
}
}
The file picks off the email from one of three colums..and it does
this perfectly. So, lets say the input in the v_email section is like
f_d_flinstone@bedrock.com
mick_jagger@stones.com
john_d_doe@dead.zone
I'd expect the out put to be
f_d_flinstone@bedrock.com f d flintstone
mick_jagger@stones.com mick jagger
john_d_doe@dead.zone john d doe
But, this isn't the case...I get something like:
f_d_flind@bedrdck.cflintstone
This isn't a real example, since I don't want to publish real email
addresses here. But, it appears to be overwriting the first entry
(v_email) instead of tabbing over across the page till it hits the \n.
If I comment out each printf statement except for one, they all work
individually..just blows when run all together. Can someone give me a
hint as to what's going wrong...or links to good info on this? I can't
find any good examples on the newsgroups or books so far on this.
This is just part of a program I'm writing, I'll be parsing for all
kinds of things in the name, but, this is the first section I'm
tackling.
Thanks in advance!!
Chilecayenne
|
|
0
|
|
|
|
Reply
|
chilecayenne (29)
|
9/1/2004 9:25:53 PM |
|
On 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com
(cayenne) wrote:
>HI all,
>I'm a bit of an awk newbie. I'm trying to use some conditional
>statements to generate printf statements to print colums on a page. As
>I read the printf info, it said it didn't do a linefeed till you
>explicitly put one in '\n'.
Correct.
>Here's an example of what I'm trying to do on an input of email
>addresses. I'm trying in this section to split off the username before
>the @ sign. And if there are underscores (_) in the username, I find
>out if they are first and middle initials and a last name...and I want
>to print out in columns the email address, the first name or first
>initial, the middle initial or last name, and the last name if there
>are a first name/first initial and middle initial.
....
>The file picks off the email from one of three colums..and it does
>this perfectly. So, lets say the input in the v_email section is like
>
>f_d_flinstone@bedrock.com
>mick_jagger@stones.com
>john_d_doe@dead.zone
>
>I'd expect the out put to be
>
>f_d_flinstone@bedrock.com f d flintstone
>mick_jagger@stones.com mick jagger
>john_d_doe@dead.zone john d doe
Exactly what gawk produces.
>But, this isn't the case...I get something like:
>f_d_flind@bedrdck.cflintstone
Looks like your version of awk may have a problem. Need more info.
What command are you using to run awk? Which awk and version are you
using, under which shell and version, and OS and version?
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
|
|
0
|
|
|
|
Reply
|
Brian
|
9/1/2004 10:00:54 PM
|
|
cayenne wrote:
> HI all,
> I'm a bit of an awk newbie. I'm trying to use some conditional
> statements to generate printf statements to print colums on a page. As
> I read the printf info, it said it didn't do a linefeed till you
> explicitly put one in '\n'.
>
> Here's an example of what I'm trying to do on an input of email
> addresses. I'm trying in this section to split off the username before
> the @ sign. And if there are underscores (_) in the username, I find
> out if they are first and middle initials and a last name...and I want
> to print out in columns the email address, the first name or first
> initial, the middle initial or last name, and the last name if there
> are a first name/first initial and middle initial.
>
> I'm trying to do:
>
> BEGIN {FS="~";}
> {
>
> v_email = ""
> v_name = ""
> v_first_name = ""
> v_first_initial = ""
> v_middle_initial = ""
> v_last_name = ""
>
> if ($10 !="" && /@/){
> v_email = $10}
> else if ($11!="" && /@/){
> v_email = $11}
> else if ($12 !="" && /@/){
> v_email = $12}
>
>
> split(v_email,v_email_parts,"@")
>
> v_name = v_email_parts[1]
>
> #First, test for under score splitter
>
> if (match(v_name,/_/)){
>
> split(v_name,v_name_parts,"_")
>
> printf("%s\t",v_email)
>
> if (length(v_name_parts[1]) == 1){
> v_first_initial = v_name_parts[1]
> printf("%s\t",v_first_initial)
> }
> else {
> v_first_name = v_name_parts[1]
> printf("%s\t",v_first_name)
> }
>
> if (length(v_name_parts[2]) == 1){
> v_middle_initial = v_name_parts[2]
> printf("%s\t",v_middle_initial)
> }
> else {
> v_last_name = v_name_parts[2]
> printf("%s\t",v_last_name)
> }
>
> if (length(v_name_parts[3]) > 0){
> v_last_name = v_name_parts[3]
> printf("%s\t",v_last_name)
> }
> printf("\n")
>
> }
> }
>
> The file picks off the email from one of three colums..and it does
> this perfectly. So, lets say the input in the v_email section is like
>
> f_d_flinstone@bedrock.com
> mick_jagger@stones.com
> john_d_doe@dead.zone
>
> I'd expect the out put to be
>
> f_d_flinstone@bedrock.com f d flintstone
> mick_jagger@stones.com mick jagger
> john_d_doe@dead.zone john d doe
>
> But, this isn't the case...I get something like:
> f_d_flind@bedrdck.cflintstone
The above code shouldn't produce that given the input you show Have you
tried getting rid of some of the printfs to narrow it down to exactly
which printf(s) cause the problem?
Two possibilites are that your actual input file either:
a) contains control characters which could cause the output to look
jumbled, or
b) contains empty lines or others which don't contain a "@" in which
case your initial tests for setting v_email would fail and you fall into
the "split" with v_email set to "" and I don't know what would happen
with the resultant invalid array accesses you do after that.
For "a", which I think is the most likely problem, you just need to
check your input. For "b", you should really structure your code as:
BEGIN{ ... }
/@/ { ... }
rather than just:
BEGIN{ ... }
{ ... }
to make sure you're only processing lines with an "@" symbol (presumably
email addresses).
An unrelated enhancement you might want to consider is to change this:
if (match(v_name,/_/)){
split(v_name,v_name_parts,"_")
to this:
num_parts = split(v_name,v_name_parts,"_")
if (num_parts > 1){
i.e. just check the value returned from split to see if there as an "_"
rather than having to call a separate "match" function first.
You also don't need to test for whether the last name is in the 2nd or
3rd piosition becayuse you can just do:
v_last_name = v_name_parts[num_parts]
You should probably revisit the way you're assigning v_last_name anyway
since your current method would, given an input address of
"jim_bob_jones@whatever.com", set the first name to "jim" and the last
name to "jones" but completely ignore the "bob" (actually it would save
that as the last name then over-write it).
Hope that helps,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
9/2/2004 1:54:55 PM
|
|
Brian Inglis <Brian.Inglis@SystematicSW.Invalid> wrote in message news:<behcj0tujq3v7f6gqr062m585833228prv@4ax.com>...
> On 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com
> (cayenne) wrote:
>
> >HI all,
> >I'm a bit of an awk newbie. I'm trying to use some conditional
> >statements to generate printf statements to print colums on a page. As
> >I read the printf info, it said it didn't do a linefeed till you
> >explicitly put one in '\n'.
>
> Correct.
>
> >Here's an example of what I'm trying to do on an input of email
> >addresses. I'm trying in this section to split off the username before
> >the @ sign. And if there are underscores (_) in the username, I find
> >out if they are first and middle initials and a last name...and I want
> >to print out in columns the email address, the first name or first
> >initial, the middle initial or last name, and the last name if there
> >are a first name/first initial and middle initial.
>
> ...
>
> >The file picks off the email from one of three colums..and it does
> >this perfectly. So, lets say the input in the v_email section is like
> >
> >f_d_flinstone@bedrock.com
> >mick_jagger@stones.com
> >john_d_doe@dead.zone
> >
> >I'd expect the out put to be
> >
> >f_d_flinstone@bedrock.com f d flintstone
> >mick_jagger@stones.com mick jagger
> >john_d_doe@dead.zone john d doe
>
> Exactly what gawk produces.
>
> >But, this isn't the case...I get something like:
> >f_d_flind@bedrdck.cflintstone
>
> Looks like your version of awk may have a problem. Need more info.
> What command are you using to run awk? Which awk and version are you
> using, under which shell and version, and OS and version?
Hi Brian, thank you very much for your reply!! I'm running Gentoo
Linux, with the gentoo sources kernel: linux-2.4.20-gentoo-r5.
awk --version gives me:
GNU Awk 3.1.3
Copyright (C) 1989, 1991-2003 Free Software Foundation.
I'm a little new to the differences with awk, gawk, and nawk...but,
just to check things a little further I did a look in /bin to find
that awk is a link to gawk on my system:
/bin/awk -> gawk-3.1.3
I'm using the following to run my script:
cat white_pages.csv | awk -f phone1.awk | more
white_pages.csv is my file I'm picking off the email addresses of, and
phone1.awk is my script file. I'm just using more to scroll down the
results to look at them onscreen for now.
Thanks for any insight and suggestions you can help me with! I really
like working with awk so far...but, is easy to stumble as you progress
to slighly more complex things.
CC
|
|
0
|
|
|
|
Reply
|
chilecayenne
|
9/2/2004 2:28:26 PM
|
|
cayenne wrote:
<snip>
> I'm using the following to run my script:
> cat white_pages.csv | awk -f phone1.awk | more
This is commonly called "UUOC" (Useless Use Of Cat) since awk can take a
file name argument. Do this instead:
awk -f phone1.awk white_pages.csv | more
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
9/2/2004 3:06:26 PM
|
|
In article <DcydnafNxcKtu6rcRVn-jA@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>cayenne wrote:
>> HI all,
>> I'm a bit of an awk newbie. I'm trying to use some conditional
>> statements to generate printf statements to print colums on a page. As
>> I read the printf info, it said it didn't do a linefeed till you
>> explicitly put one in '\n'.
>>
>> Here's an example of what I'm trying to do on an input of email
>> addresses. I'm trying in this section to split off the username before
>> the @ sign. And if there are underscores (_) in the username, I find
>> out if they are first and middle initials and a last name...and I want
>> to print out in columns the email address, the first name or first
>> initial, the middle initial or last name, and the last name if there
>> are a first name/first initial and middle initial.
>>
>> I'm trying to do:
>>
>> BEGIN {FS="~";}
>> {
Lots of program snipped.
>> printf("%s\t",v_email)
At this point, what if "v_email" is terminated with \r\n instead of
the unix-style \n.
It gets printed, and the invisible writing cursor is moved back
to the beginning of the line.
>> The file picks off the email from one of three colums..and it does
>> this perfectly. So, lets say the input in the v_email section is like
>>
>> f_d_flinstone@bedrock.com
>> mick_jagger@stones.com
>> john_d_doe@dead.zone
>>
>> I'd expect the out put to be
>>
>> f_d_flinstone@bedrock.com f d flintstone
>> mick_jagger@stones.com mick jagger
>> john_d_doe@dead.zone john d doe
>>
>> But, this isn't the case...I get something like:
>> f_d_flind@bedrdck.cflintstone
To be expected it there is a "carriage return" \r character
just after the ".com" in the first part of the output.
To be expected it there is a "carriage return" \r character
just after the ".com" in the first part of the output.
carl
--
carl lowenstein marine physical lab u.c. san diego
clowenst@ucsd.edu
|
|
0
|
|
|
|
Reply
|
cdl
|
9/2/2004 6:11:53 PM
|
|
Ed Morton <morton@lsupcaemnt.com> wrote in message news:<DcydnafNxcKtu6rcRVn-jA@comcast.com>...
<snip>
> > But, this isn't the case...I get something like:
> > f_d_flind@bedrdck.cflintstone
>
> The above code shouldn't produce that given the input you show Have you
> tried getting rid of some of the printfs to narrow it down to exactly
> which printf(s) cause the problem?
>
> Two possibilites are that your actual input file either:
>
> a) contains control characters which could cause the output to look
> jumbled, or
> b) contains empty lines or others which don't contain a "@" in which
> case your initial tests for setting v_email would fail and you fall into
> the "split" with v_email set to "" and I don't know what would happen
> with the resultant invalid array accesses you do after that.
>
> For "a", which I think is the most likely problem, you just need to
> check your input. For "b", you should really structure your code as:
>
> BEGIN{ ... }
> /@/ { ... }
>
> rather than just:
>
> BEGIN{ ... }
> { ... }
>
> to make sure you're only processing lines with an "@" symbol (presumably
> email addresses).
>
> An unrelated enhancement you might want to consider is to change this:
>
> if (match(v_name,/_/)){
>
> split(v_name,v_name_parts,"_")
>
> to this:
>
> num_parts = split(v_name,v_name_parts,"_")
>
> if (num_parts > 1){
>
> i.e. just check the value returned from split to see if there as an "_"
> rather than having to call a separate "match" function first.
>
> You also don't need to test for whether the last name is in the 2nd or
> 3rd piosition becayuse you can just do:
>
> v_last_name = v_name_parts[num_parts]
>
> You should probably revisit the way you're assigning v_last_name anyway
> since your current method would, given an input address of
> "jim_bob_jones@whatever.com", set the first name to "jim" and the last
> name to "jones" but completely ignore the "bob" (actually it would save
> that as the last name then over-write it).
>
> Hope that helps,
>
> Ed.
Thanks for the reply Ed.
I've gone through and commented out all but one the printf's...each
one by themselves works just fine.
Yeah, I know I need to clean up the code, and had thought about the
middle name vs. middle intital...this is just a first run through as I
started to refine it...and got stuck with the printing problem at this
early of an stage.
This file is a csv from MS excell. I'll try to check for special
characters...maybe run a dos2unix on it...But, like I said, if I just
do one printf, it works...each one individually works...but, if I
start to use 2 or more of them to spit things out in columns, it mixes
them all into one line.
I'll check on the special characters tho...
Any other suggestions greatly appreciated!!
:-)
CC
|
|
0
|
|
|
|
Reply
|
chilecayenne
|
9/2/2004 7:26:09 PM
|
|
In article <2deb3d1.0409021126.2c6446fc@posting.google.com>,
chilecayenne@yahoo.com (cayenne) wrote:
> Ed Morton <morton@lsupcaemnt.com> wrote in message
> news:<DcydnafNxcKtu6rcRVn-jA@comcast.com>...
> <snip>
> > > But, this isn't the case...I get something like:
> > > f_d_flind@bedrdck.cflintstone
> >
> > The above code shouldn't produce that given the input you show Have you
> > tried getting rid of some of the printfs to narrow it down to exactly
> > which printf(s) cause the problem?
> >
> > Two possibilites are that your actual input file either:
> >
> > a) contains control characters which could cause the output to look
> > jumbled, or
> > b) contains empty lines or others which don't contain a "@" in which
> > case your initial tests for setting v_email would fail and you fall into
> > the "split" with v_email set to "" and I don't know what would happen
> > with the resultant invalid array accesses you do after that.
> >
> > For "a", which I think is the most likely problem, you just need to
> > check your input. For "b", you should really structure your code as:
> >
> > BEGIN{ ... }
> > /@/ { ... }
> >
> > rather than just:
> >
> > BEGIN{ ... }
> > { ... }
> >
> > to make sure you're only processing lines with an "@" symbol (presumably
> > email addresses).
> >
> > An unrelated enhancement you might want to consider is to change this:
> >
> > if (match(v_name,/_/)){
> >
> > split(v_name,v_name_parts,"_")
> >
> > to this:
> >
> > num_parts = split(v_name,v_name_parts,"_")
> >
> > if (num_parts > 1){
> >
> > i.e. just check the value returned from split to see if there as an "_"
> > rather than having to call a separate "match" function first.
> >
> > You also don't need to test for whether the last name is in the 2nd or
> > 3rd piosition becayuse you can just do:
> >
> > v_last_name = v_name_parts[num_parts]
> >
> > You should probably revisit the way you're assigning v_last_name anyway
> > since your current method would, given an input address of
> > "jim_bob_jones@whatever.com", set the first name to "jim" and the last
> > name to "jones" but completely ignore the "bob" (actually it would save
> > that as the last name then over-write it).
> >
> > Hope that helps,
> >
> > Ed.
>
> Thanks for the reply Ed.
> I've gone through and commented out all but one the printf's...each
> one by themselves works just fine.
>
> Yeah, I know I need to clean up the code, and had thought about the
> middle name vs. middle intital...this is just a first run through as I
> started to refine it...and got stuck with the printing problem at this
> early of an stage.
>
> This file is a csv from MS excell. I'll try to check for special
> characters...maybe run a dos2unix on it...But, like I said, if I just
> do one printf, it works...each one individually works...but, if I
> start to use 2 or more of them to spit things out in columns, it mixes
> them all into one line.
>
> I'll check on the special characters tho...
> Any other suggestions greatly appreciated!!
> :-)
>
> CC
Change the command line to
awk -f phone1.awk white_pages.csv | cat -vte | more
The 'cat -vte' will tell you if there are any invisible characters and
especially if there are <CR><LF> pairs by displaying ^M for the <CR>
values and $ for the <LF>
Bob Harris
|
|
0
|
|
|
|
Reply
|
Bob
|
9/2/2004 8:40:34 PM
|
|
Bob Harris <harris@zk3.dec.com> wrote in message news:<harris-DF6E8A.16402002092004@cacnews.cac.cpqcorp.net>...
>
> Change the command line to
>
> awk -f phone1.awk white_pages.csv | cat -vte | more
>
> The 'cat -vte' will tell you if there are any invisible characters and
> especially if there are <CR><LF> pairs by displaying ^M for the <CR>
> values and $ for the <LF>
>
> Bob Harris
Bob and all the other great responders to my thread.
THANK YOU!!
It was indeed the ^M$ that was the problem. Have I mentioned lately
how much I HATE MS Windoze? Arrrgh. It is hard enough learning
something new without having to deal with the hidden crap MS puts into
what should be a plain text file...
Anyway, again thanks for the help on this, the syntax corrections, and
the new trick with cat -vte. I'd never done a man cat before...found
it does more than I'd thought.
Thanx,
CC
ps. Just curious, I've seen mentioned in responses here and other
forums where people get irritated about using cat 'too much'. Just
curious as to why?
|
|
0
|
|
|
|
Reply
|
chilecayenne
|
9/3/2004 2:42:19 PM
|
|
In article <2deb3d1.0409030642.1a10d901@posting.google.com>,
cayenne <chilecayenne@yahoo.com> wrote:
....
>ps. Just curious, I've seen mentioned in responses here and other
>forums where people get irritated about using cat 'too much'. Just
>curious as to why?
For some reason, newbies often write:
cat somefile | someutil ...
and this is unnecessary, and, in theory at least, wasteful. I won't go
into the various details as to why it is unnecessary and wrong (STFW), but
I will take the opportunity to say that, many, many moons ago, I saw the
following in an MSDOS manual:
type file | more
and so, as with most things that are wrong in computing, it is all Bill
Gates's fault. Note that the above is particularly bad in DOS, which
doesn't have any sort of multitasking and thus has only fake pipes.
|
|
0
|
|
|
|
Reply
|
gazelle
|
9/3/2004 2:52:19 PM
|
|
cayenne wrote:
<snip>
> ps. Just curious, I've seen mentioned in responses here and other
> forums where people get irritated about using cat 'too much'. Just
> curious as to why?
It's not so much about getting irritated, it's more helping people learn
when they don't need to use it. The common mistake newcomers make is to use:
cat file | some_command
when "some_command" can take a file argument and so the above could be
written as:
some_command file
or it could just read redirected input as:
some_command < file
and save an external command (cat) and a pipe.
Let's say you have a kid who puts on their shoes, then takes them off
and puts on their socks then puts their shoes back on. Wouldn't you tell
them that they don't actually need to put on their shoes the first time?
After seeing several kids do this, wouldn't you get a tad irritated
and wonder who the heck is out there telling kids that that's the right
way to do things? It's kinda like that....
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
9/3/2004 2:56:10 PM
|
|
Kenny McCormack wrote:
>>ps. Just curious, I've seen mentioned in responses here and other
>>forums where people get irritated about using cat 'too much'. Just
>>curious as to why?
>
>
> For some reason, newbies often write:
>
> cat somefile | someutil ...
I often do (although I'm not a newbie), since I think in terms of
pipelines and this way it goes nicely from left to right.
> and this is unnecessary, and, in theory at least, wasteful. I won't go
> into the various details as to why it is unnecessary and wrong (STFW), but
On occasion, it can be better. Copying a large file from one disk to
another is often better achieved with:
cat file1 | cat > file2
(or better still dd), since you're reading and writing in parallel like
this.
-Ed
--
(You can't go wrong with psycho-rats.) (er258)(@)(eng.cam)(.ac.uk)
/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage
|
|
0
|
|
|
|
Reply
|
E
|
9/3/2004 5:44:48 PM
|
|
E. Rosten wrote:
> Kenny McCormack wrote:
>
>>> ps. Just curious, I've seen mentioned in responses here and other
>>> forums where people get irritated about using cat 'too much'. Just
>>> curious as to why?
>>
>>
>>
>> For some reason, newbies often write:
>>
>> cat somefile | someutil ...
>
>
>
> I often do (although I'm not a newbie), since I think in terms of
> pipelines and this way it goes nicely from left to right.
Then presumably writing it as:
cat somefile | cat | someutil
is even better since it extends even further from left to right ;-).
It's fine to think in terms of pipelines, but I can't imagine why you'd
want to introduce commands gratuitously at the head or tail of a pipeine.
>> and this is unnecessary, and, in theory at least, wasteful. I won't go
>> into the various details as to why it is unnecessary and wrong (STFW),
>> but
>
>
> On occasion, it can be better. Copying a large file from one disk to
> another is often better achieved with:
>
> cat file1 | cat > file2
>
> (or better still dd), since you're reading and writing in parallel like
> this.
I've never come across that situation. When you say it's better - do you
mean faster, or more reliable, or something else? Can you go into any
more detail on why it's better as the benefits aren't intuitively obvious.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
9/3/2004 5:52:59 PM
|
|
In article <q5idnS1ImonjMqXcRVn-rQ@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>E. Rosten wrote:
>> Kenny McCormack wrote:
>>
>>>> ps. Just curious, I've seen mentioned in responses here and other
>>>> forums where people get irritated about using cat 'too much'. Just
>>>> curious as to why?
>>>
>>>
>>>
>>> For some reason, newbies often write:
>>>
>>> cat somefile | someutil ...
>>
>>
>>
>> I often do (although I'm not a newbie), since I think in terms of
>> pipelines and this way it goes nicely from left to right.
>
>Then presumably writing it as:
>
> cat somefile | cat | someutil
>
>is even better since it extends even further from left to right ;-).
>It's fine to think in terms of pipelines, but I can't imagine why you'd
>want to introduce commands gratuitously at the head or tail of a pipeine.
Indeed. The standard Randy Schwartz answer to "but I like to see my data
go from left to right" is:
< file someutil
which works in any Bourne-ish shell.
>>> and this is unnecessary, and, in theory at least, wasteful. I won't go
>>> into the various details as to why it is unnecessary and wrong (STFW),
>>> but
>>
>>
>> On occasion, it can be better. Copying a large file from one disk to
>> another is often better achieved with:
>>
>> cat file1 | cat > file2
>>
>> (or better still dd), since you're reading and writing in parallel like
>> this.
>
>I've never come across that situation. When you say it's better - do you
>mean faster, or more reliable, or something else? Can you go into any
>more detail on why it's better as the benefits aren't intuitively obvious.
I don't the claim holds water in any general sense. It *might* be true on
some particular piece of hardware under some particular set of conditions.
|
|
0
|
|
|
|
Reply
|
gazelle
|
9/3/2004 6:56:48 PM
|
|
On Fri, 03 Sep 2004 12:52:59 -0500 in comp.lang.awk, Ed Morton
<morton@lsupcaemnt.com> wrote:
>
>
>E. Rosten wrote:
>> Kenny McCormack wrote:
>>
>>>> ps. Just curious, I've seen mentioned in responses here and other
>>>> forums where people get irritated about using cat 'too much'. Just
>>>> curious as to why?
>> On occasion, it can be better. Copying a large file from one disk to
>> another is often better achieved with:
>>
>> cat file1 | cat > file2
>>
>> (or better still dd), since you're reading and writing in parallel like
>> this.
>
>I've never come across that situation. When you say it's better - do you
>mean faster, or more reliable, or something else? Can you go into any
>more detail on why it's better as the benefits aren't intuitively obvious.
I can't see that ever being better than cp [-p] file1 file2, as
mmap() is often used to have the OS do most of the work, and cp
normally handles sparse files, whereas cat can't.
To make local and remote file and directory copies retaining dates and
permissions, tar idioms are often used:
cd fdir ; tar cf - file | ( cd tdir; tar xf - )
cd fdir ; tar cf - . | ( cd tdir; tar xf - )
rsh host '( cd fdir ; tar cf - . )' | ( cd tdir; tar xf - )
cd fdir ; tar cf - . | rsh host '( cd tdir; tar xf - )'
I wonder if newbies pick this up, can't remember the tar details and
substitute cat instead?
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
|
|
0
|
|
|
|
Reply
|
Brian
|
9/3/2004 10:07:13 PM
|
|
>> I often do (although I'm not a newbie), since I think in terms of
>> pipelines and this way it goes nicely from left to right.
>
>
> Then presumably writing it as:
>
> cat somefile | cat | someutil
>
> is even better since it extends even further from left to right ;-).
> It's fine to think in terms of pipelines, but I can't imagine why you'd
> want to introduce commands gratuitously at the head or tail of a pipeine.
I just find doing it that way more logical. If I go to the effort of
writing a shell script (ie doing the task more than once), I'll do it
properly, but when I'm doing a one-liner, my brain taps out cat file |
before I've had a chance to stop it :-)
It is slightly wastful, but not enough so to makebreaking the habit
succesful.
>> On occasion, it can be better. Copying a large file from one disk to
>> another is often better achieved with:
>>
>> cat file1 | cat > file2
>>
>> (or better still dd), since you're reading and writing in parallel
>> like this.
>
>
> I've never come across that situation. When you say it's better - do you
> mean faster, or more reliable, or something else?
Faster.
> Can you go into any
> more detail on why it's better as the benefits aren't intuitively obvious.
cp seems to work by copying a bunch of data from one disk then writing
it to the other disk. At least that's what my observations of the noises
made by the disks lead me to believe.
with cat file | cat > file
The second cat empties the data from the pipe (fast) and writes it to
disk. As it is writing, the first cat sees that the pipeline is empty,
so it can go and read some data from the disk in order to fill that
pipeline. In this way, the write to the disk by the second cat happens
simultaneously with the read made by the first cat.
This only speeds up things if the files are on different physical disks,
or even better, on different physical channels.
But it really does work, try creating a very large file and copying it.
-Ed
--
(You can't go wrong with psycho-rats.) (er258)(@)(eng.cam)(.ac.uk)
/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage
|
|
0
|
|
|
|
Reply
|
E
|
9/6/2004 12:37:50 PM
|
|
|
15 Replies
165 Views
(page loaded in 0.183 seconds)
Similiar Articles: comp.lang.awk - page 45Multiple printf's: Not printing properly on same line. 15 84 (9/1/2004 9:25:53 PM) HI all, I'm a bit of an awk newbie. I'm trying to use some conditional statements to ... how to merge multiple lines into one line - comp.lang.awk ...... on one line... {printf "%s ",$ ... be able to print on one line. Providing total bytes do not exceed sed's ... in the same figure (different subplots ... Combine ... multiple ... How to write Unicode - comp.lang.java.programmer... printStream.print(( char )0x2028 ); // 0x2028 is Unicode line separator > Not ... question, printf()'s "%n" specifies "the platform-specific line ... to UTF-8 properly ... Finding common lines between text files - comp.unix.programmer ...... ended up with 2 lines with same content but with one line being tab-prefixed. This is not ... StringSet.end(); it++) printf("%s\n ... on awk and it didn't print a new line. It ... parsing name value pairs - comp.lang.awkI tried something liek this >printf ("%s\n", A[docid ... Awk (remove the "* " at the start of each line; it's ... Stu wrote: > Ed, > The following did not print any ... can u suggest me to do better than this? - comp.lang.awk ...... 1.0 7.1 490 3410 In above file each line is having ... the 2nd column of each '-' and 'r' pairs of same id and print ... ids) { if (val[id,"-"] > val[id,"r"]) { printf "%s ... how to transpose large matrix? - comp.unix.shell... nf;i++) { while (getline<FILENAME) printf "%s ",$i; print ... and if all the "cells" are the same width, it ... line$sep$elem" sep=" " done printf "%s\n" "$line ... setting timer - comp.unix.programmerBecause it is guaranted that printf() is not re-entered in ... volatile short must_print=0 PB> static void ... That's the same with multi-threading. close() is thread-safe ... How to get newline in HTML? Without BR - comp.lang.javascript ...... seems to print "\r" and "\n" but not give the effect of a new line. ... in the same ... to merge multiple lines into one line - comp.lang.awk ... Jim Perhaps using awk's printf ... sed - add thousand separator - comp.unix.shellUsing printf should(?) make it ... script to be a one-liner, not > the same thing. Any script can be put on one line; there's no ... Create Multiple Users on Solaris 10 ... Swing Copy Problem - comp.lang.java.gui... the text was selected (highlighted) but that's not ... First, it is a bad idea to use the same identifier ... in both windows it gets confused and does not > work properly ... !P.Multi Broken in IDL 8.0.1 - comp.lang.idl-pvwaveNaturally, you can't use both !P.MULTI and position the plot at the same time. ... just noticed that the init method is not properly ... ... IDL> r = RANDOMU(SEED,1) IDL> print, r 0 ... How do I merge the contents of two files ... - comp.unix.shell ...I have two files, each having the same number of lines. ... the files such that the resulting file has a line ... read -u3 left && read -u4 right do printf "%s ... Draw a Cone - comp.graphics.api.opengl... help")) { printf("Usage: %s [options] \n\n", argv[0]); printf("Options:\n"); printf("\t-help Print ... there are often multiple ways to draw the same ... insert space between two words.... - comp.lang.awkThus, they are multi-posting, as ... I'm in the same situation and after reading pk's post on this thread I ... g. that a simple 'print' will not print an empty line but ... printf format string - Wikipedia, the free encyclopediaPrintf format string (of which "printf" stands for "print ... will print the following line (including new-line ... which acts according to the same principles as printf ... Formatted Output and the printf function - The School of ...Inside the parentheses, we separate multiple ... the next output will be printed on that same next line. ... 10 format, whereas the format string "%s" tells printf to print a ... 7/30/2012 4:43:16 AM
|