Is my sed script the wordiest?

  • Follow


I'd like to solicit comments on a script I made to process an 
..addressbook the way Pine saves it to a form that I wanted.  But I called 
up sed three different times, making three passes through it.  That 
didn't seem quite right because everyone else seems to do their sedding 
in one pass, but I couldn't figure out how to do it in a single pass 
without leaving some spaces at the beginning of some lines, and a 
paranthesis hanging out.

steel02$ cat abconv
cut -f3 .addressbook > tmpbk
sed '
/^$/d
/@/!d
' tmpbk > tmpbk2

sed '
/^/ {
N
s/,\n  *//
}' tmpbk2 > tmpbk3

sed '
s/^  *//
/".*,.*"/n
s/,/\
/g
s/^(//
s/)$//

' tmpbk3 > tmpbk4

0. Get just field 3, throw everything else away.

1. Delete blank lines.
2. Print lines that contain an @.

Originally a separate step because I used the -n option.  The I rewrote 
it without that option, but couldn't combine it with the next script 
without leaving undeleted blank lines, and expanding comma separated 
terms that should have been deleted.

3. Combine two lines when one ends with a comma and the next starts
   with some spaces.

When I try to combine this with the next one, I get some lines still 
beginning with spaces, and one line still beginning with a (.

4. Remove spaces at the beginning of remaining lines.
5. Take no action on terms like "Hansen, Christopher J"
6. Put other comma separated terms on individual lines,
   e.g. a list of e-mail addresses for one person.
7. Remove ( at the beginning and ) at the end (indicates
   multiple addresses for one person).

I know my comma work could use some generality, but clashes haven't come 
up yet.

-- 
"Things should be made as simple as possible -- but no simpler."
  -- Albert Einstein
0
Reply glhansen (396) 12/11/2003 9:07:06 PM

In article <bram9q$lss$1@hood.uits.indiana.edu>,
Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>I'd like to solicit comments on a script I made to process an 
>.addressbook the way Pine saves it to a form that I wanted.  But I called 
>up sed three different times, making three passes through it.  That 

This isn't the first time it's happened to me.  Sometimes when I'm on a 
problem my thinking evolves in a certain way, I start at some point and 
then decide to go to another, so i become almost blind to any other way to 
do things.

I've managed to simplify the script a lot,

# convert .addressbook

cut -f3 .addressbook > tmpbk

sed '
/@/!d
/".*,.*"/n
s/,\n*/\
/
s/^  *//
s/^(//
s/)$//
' tmpbk > tmpbk2

sed '/^$/d' tmpbk2

But I still need that last sed script to remove a blank line that somehow 
appears, and putting a /^$/d command in the first script just won't touch 
it, although it seems like it should.

-- 
"A nice adaptation of conditions will make almost any hypothesis agree
with the phenomena.  This will please the imagination but does not advance
our knowledge." -- J. Black, 1803.
0
Reply glhansen (396) 12/12/2003 2:51:55 AM


On Fri, 12 Dec 2003 at 02:51 GMT, Gregory L. Hansen wrote:
> In article <bram9q$lss$1@hood.uits.indiana.edu>,
> Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>>I'd like to solicit comments on a script I made to process an 
>>.addressbook the way Pine saves it to a form that I wanted.  But I called 
>>up sed three different times, making three passes through it.  That 
> 
> This isn't the first time it's happened to me.  Sometimes when I'm on a 
> problem my thinking evolves in a certain way, I start at some point and 
> then decide to go to another, so i become almost blind to any other way to 
> do things.
> 
> I've managed to simplify the script a lot,
> 
> # convert .addressbook
> 
> cut -f3 .addressbook > tmpbk
> 
> sed '
> /@/!d
> /".*,.*"/n
> s/,\n*/\
> /

  ^ This is where the blank lines are introduced.

> s/^  *//
> s/^(//
> s/)$//
> ' tmpbk > tmpbk2
> 
> sed '/^$/d' tmpbk2
> 
> But I still need that last sed script to remove a blank line that somehow 
> appears, and putting a /^$/d command in the first script just won't touch 
> it, although it seems like it should.
> 

   You don't need cut or the second sed:

sed 's/.*  .*      \(.*\)/\1/
/@/!d
/".*,.*"/n
s/,$//
s/,/\
/
s/^  *//
s/^(//
s/)$//
' ~/.addressbook


-- 
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2003, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License
0
Reply c.fa.johnson (292) 12/12/2003 4:44:39 PM

In article <brcr9m$20663$1@ID-210011.news.uni-berlin.de>,
Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:
>On Fri, 12 Dec 2003 at 02:51 GMT, Gregory L. Hansen wrote:
>> In article <bram9q$lss$1@hood.uits.indiana.edu>,
>> Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>>>I'd like to solicit comments on a script I made to process an 
>>>.addressbook the way Pine saves it to a form that I wanted.  But I called 
>>>up sed three different times, making three passes through it.  That 
>> 
>> This isn't the first time it's happened to me.  Sometimes when I'm on a 
>> problem my thinking evolves in a certain way, I start at some point and 
>> then decide to go to another, so i become almost blind to any other way to 
>> do things.
>> 
>> I've managed to simplify the script a lot,
>> 
>> # convert .addressbook
>> 
>> cut -f3 .addressbook > tmpbk
>> 
>> sed '
>> /@/!d
>> /".*,.*"/n
>> s/,\n*/\
>> /
>
>  ^ This is where the blank lines are introduced.
>
>> s/^  *//
>> s/^(//
>> s/)$//
>> ' tmpbk > tmpbk2
>> 
>> sed '/^$/d' tmpbk2
>> 
>> But I still need that last sed script to remove a blank line that somehow 
>> appears, and putting a /^$/d command in the first script just won't touch 
>> it, although it seems like it should.
>> 
>
>   You don't need cut or the second sed:
>
>sed 's/.*  .*      \(.*\)/\1/
>/@/!d
>/".*,.*"/n
>s/,$//
>s/,/\
>/
>s/^  *//
>s/^(//
>s/)$//
>' ~/.addressbook

I'm not sure what's going on in the first line, but I get the general gist 
of discarding the first two fields.  My comfort level goes to about

s/^.*	.*	//

which seems to do the same thing.

But sadly, the script doesn't work.  Accreting it line by line, the first 
two move in the right direction.  But the third line, /".*,.*"/n, adds 
information that I'd thought was discarded.  For instance,

"Gregory L. Hansen" <glhansen@iucf.indiana.edu>

becomes

greg	Hansen, Gregry L.	"Gregory L. Hansen" <glhansen@iucf.indiana.edu>

And it sort of hangs around for the rest of the process.  I sure don't 
know why.

To burst the commas, I wanted to do something like

s/(.*,.*)/.*1\
..*2/

where .*1 is stuff between the ( and , and .*2 is stuff between the , and 
).  But as far as I know, sed just doesn't work that way.

Maybe it would help to post the data I'm working on.  I didn't do that at 
first because I thought it would be annoying, and hoped it wouldn't be 
needed.  But maybe it would help.  I'm trying to get just a single e-mail 
address on each line.

steel06$ cat .addressbook
                Truebane@aol.com
davidhenry              David_Henry@pch.gc.ca
Gus             xon@pacbell.net
julie           julietobako@visi.com
lithium         gs@ix.netcom.com
   He told me how to fix my computer when it stopped working.
mom             Nancy_M._Hansen@notes.mdor.state.mn.us
varney          mvarney@uswest.net
warren          WFritze@mrp.com
justin  (E-mail), Justin Low [lowjustin@uswest.net]
   "Justin Low [lowjustin@uswest.net] (E-mail)" <lowjustin@uswest.net>
AQ      Abdulquayuum K.T. Al-Saud       AQ@eldjinn.demon.co.uk
adam    Adam Szczepaniak        aszczepa@indiana.edu
lorraine        bagaan, lorraine        lorraine bagaan 
<lbagaan@yahoo.com>
karl    Berkner, Karl   Karl Berkner <kberkner4792@attbi.com>
blessinger      Blessinger, Christopher
   Christopher Blessinger <chblessi@indiana.edu>
boggs   Boggs, David A. "David A. Boggs" <daboggs@dodgenet.com>
brenda  Brenda Fink     ratfink@pacbell.net
chris   Chris Hansen    chris@bitstream.net
eric    Crystal, Eric & Eric & Crystal <c-d-e@comcast.net>
peter   Diehr, Peter    Peter Diehr <pdiehr@srv2.ic.net>
ken     Fischer, Ken    Ken Fischer <kfischer@iglou.com>
paul    g, apolinario paul      apolinario paul g 
<u1000684@email.sjsu.edu>
   Paul from San Jose
galen   Galen Cruze     jaraxle@winternet.com
chris2  "Hansen, Chris" "Hansen, Chris" <CHansen@dartadvantage.com>
greg    Hansen, Gregory L.      "Gregory L. Hansen" 
<glhansen@iucf.indiana.edu>
gail    Hanson, Gail    Gail Hanson <gail@needmore.physics.indiana.edu>
hawking Hawking, Stephen W      Stephen W Hawking 
<S.W.Hawking@damtp.cam.ac.uk>
tonya   Higdon, Tonya   Tonya Higdon <kobra@mail.kiva.net>
john    John Tobako     evl666@juno.com
bob     Jr., Robert Langtry     "Robert Langtry Jr." <twilight@visi.com>
troy    Paton, Troy R   Troy R Paton <trp81@juno.com>
holger  Reuchlin, Holger        Holger Reuchlin 
<H.Reuchlin@geopraktiker.com>
wolfgang        Rupprecht, Wolfgang S.
   "Wolfgang S. Rupprecht" <wolfgang@wsrcc.com>
cathy   schmid, cathy   cathy schmid <cathyschmid@hotmail.com>
snow    Snow, William Michael   William Michael Snow 
<snow@iucf.indiana.edu>
jo      Sulzen, Joanna  Joanna Sulzen <josulzen@earthlink.net>
tobako  Tobako, John & Julie    John & Julie Tobako <j_tobako@juno.com>
town    Town, Michael W Michael W Town <mtown@juno.com>
roland  Wiley, Roland L.        "Roland L. Wiley" 
<u1000189@email.sjsu.edu>
minnesota               (agg,chris,dave,mom)
dave    'dave'  ('dave'	<dave1@riptide.wavetech.net>,DAVID.WHITLOCK@usbank.com)
agg     AGG     (eric,galen,justin,tobako,troy,warren)
mark    Mark Templeton  (Mark Templeton <templeton_mark@yahoo.com>,
   mark@templetonfamily.com,ufs12282@email.sjsu.edu)
sanjose People from San Jose    (brenda,gus,lorraince,mark,paul,roland)

-- 
"What are the possibilities of small but movable machines?  They may or
may not be useful, but they surely would be fun to make."
    -- Richard P. Feynman, 1959
0
Reply glhansen (396) 12/12/2003 5:29:19 PM

In article <brcttf$irv$1@hood.uits.indiana.edu>,
Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>In article <brcr9m$20663$1@ID-210011.news.uni-berlin.de>,
>Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:

>>sed 's/.*  .*      \(.*\)/\1/

Okay, I reread a section in my sed book that I must not have been paying 
much attention to.  \1 is equal to whatever was matched between the 
escaped \( and \).  But I'm still not sure what the difference is between 
that and

>s/^.*	.*	//

But a s/^.*  .*      \(.*\).*/\1/ would cut field 3 from a line with any 
number of fields.

>
>To burst the commas, I wanted to do something like
>
>s/(.*,.*)/.*1\
>.*2/
>
>where .*1 is stuff between the ( and , and .*2 is stuff between the , and 
>).  But as far as I know, sed just doesn't work that way.

I knew wrong.
-- 
"'No user-serviceable parts inside.'  I'll be the judge of that!"
0
Reply glhansen (396) 12/13/2003 9:18:19 PM

4 Replies
9 Views

(page loaded in 0.065 seconds)

6/20/2013 5:30:34 AM


Reply: