I'd like to solicit comments on a script I made to process an
..addressbook the way Pine saves it to a form that I wanted. But I called
up sed three different times, making three passes through it. That
didn't seem quite right because everyone else seems to do their sedding
in one pass, but I couldn't figure out how to do it in a single pass
without leaving some spaces at the beginning of some lines, and a
paranthesis hanging out.
steel02$ cat abconv
cut -f3 .addressbook > tmpbk
sed '
/^$/d
/@/!d
' tmpbk > tmpbk2
sed '
/^/ {
N
s/,\n *//
}' tmpbk2 > tmpbk3
sed '
s/^ *//
/".*,.*"/n
s/,/\
/g
s/^(//
s/)$//
' tmpbk3 > tmpbk4
0. Get just field 3, throw everything else away.
1. Delete blank lines.
2. Print lines that contain an @.
Originally a separate step because I used the -n option. The I rewrote
it without that option, but couldn't combine it with the next script
without leaving undeleted blank lines, and expanding comma separated
terms that should have been deleted.
3. Combine two lines when one ends with a comma and the next starts
with some spaces.
When I try to combine this with the next one, I get some lines still
beginning with spaces, and one line still beginning with a (.
4. Remove spaces at the beginning of remaining lines.
5. Take no action on terms like "Hansen, Christopher J"
6. Put other comma separated terms on individual lines,
e.g. a list of e-mail addresses for one person.
7. Remove ( at the beginning and ) at the end (indicates
multiple addresses for one person).
I know my comma work could use some generality, but clashes haven't come
up yet.
--
"Things should be made as simple as possible -- but no simpler."
-- Albert Einstein
|
|
0
|
|
|
|
Reply
|
glhansen (396)
|
12/11/2003 9:07:06 PM |
|
In article <bram9q$lss$1@hood.uits.indiana.edu>,
Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>I'd like to solicit comments on a script I made to process an
>.addressbook the way Pine saves it to a form that I wanted. But I called
>up sed three different times, making three passes through it. That
This isn't the first time it's happened to me. Sometimes when I'm on a
problem my thinking evolves in a certain way, I start at some point and
then decide to go to another, so i become almost blind to any other way to
do things.
I've managed to simplify the script a lot,
# convert .addressbook
cut -f3 .addressbook > tmpbk
sed '
/@/!d
/".*,.*"/n
s/,\n*/\
/
s/^ *//
s/^(//
s/)$//
' tmpbk > tmpbk2
sed '/^$/d' tmpbk2
But I still need that last sed script to remove a blank line that somehow
appears, and putting a /^$/d command in the first script just won't touch
it, although it seems like it should.
--
"A nice adaptation of conditions will make almost any hypothesis agree
with the phenomena. This will please the imagination but does not advance
our knowledge." -- J. Black, 1803.
|
|
0
|
|
|
|
Reply
|
glhansen (396)
|
12/12/2003 2:51:55 AM
|
|
On Fri, 12 Dec 2003 at 02:51 GMT, Gregory L. Hansen wrote:
> In article <bram9q$lss$1@hood.uits.indiana.edu>,
> Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>>I'd like to solicit comments on a script I made to process an
>>.addressbook the way Pine saves it to a form that I wanted. But I called
>>up sed three different times, making three passes through it. That
>
> This isn't the first time it's happened to me. Sometimes when I'm on a
> problem my thinking evolves in a certain way, I start at some point and
> then decide to go to another, so i become almost blind to any other way to
> do things.
>
> I've managed to simplify the script a lot,
>
> # convert .addressbook
>
> cut -f3 .addressbook > tmpbk
>
> sed '
> /@/!d
> /".*,.*"/n
> s/,\n*/\
> /
^ This is where the blank lines are introduced.
> s/^ *//
> s/^(//
> s/)$//
> ' tmpbk > tmpbk2
>
> sed '/^$/d' tmpbk2
>
> But I still need that last sed script to remove a blank line that somehow
> appears, and putting a /^$/d command in the first script just won't touch
> it, although it seems like it should.
>
You don't need cut or the second sed:
sed 's/.* .* \(.*\)/\1/
/@/!d
/".*,.*"/n
s/,$//
s/,/\
/
s/^ *//
s/^(//
s/)$//
' ~/.addressbook
--
Chris F.A. Johnson http://cfaj.freeshell.org
===================================================================
My code (if any) in this post is copyright 2003, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
|
|
0
|
|
|
|
Reply
|
c.fa.johnson (292)
|
12/12/2003 4:44:39 PM
|
|
In article <brcr9m$20663$1@ID-210011.news.uni-berlin.de>,
Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:
>On Fri, 12 Dec 2003 at 02:51 GMT, Gregory L. Hansen wrote:
>> In article <bram9q$lss$1@hood.uits.indiana.edu>,
>> Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>>>I'd like to solicit comments on a script I made to process an
>>>.addressbook the way Pine saves it to a form that I wanted. But I called
>>>up sed three different times, making three passes through it. That
>>
>> This isn't the first time it's happened to me. Sometimes when I'm on a
>> problem my thinking evolves in a certain way, I start at some point and
>> then decide to go to another, so i become almost blind to any other way to
>> do things.
>>
>> I've managed to simplify the script a lot,
>>
>> # convert .addressbook
>>
>> cut -f3 .addressbook > tmpbk
>>
>> sed '
>> /@/!d
>> /".*,.*"/n
>> s/,\n*/\
>> /
>
> ^ This is where the blank lines are introduced.
>
>> s/^ *//
>> s/^(//
>> s/)$//
>> ' tmpbk > tmpbk2
>>
>> sed '/^$/d' tmpbk2
>>
>> But I still need that last sed script to remove a blank line that somehow
>> appears, and putting a /^$/d command in the first script just won't touch
>> it, although it seems like it should.
>>
>
> You don't need cut or the second sed:
>
>sed 's/.* .* \(.*\)/\1/
>/@/!d
>/".*,.*"/n
>s/,$//
>s/,/\
>/
>s/^ *//
>s/^(//
>s/)$//
>' ~/.addressbook
I'm not sure what's going on in the first line, but I get the general gist
of discarding the first two fields. My comfort level goes to about
s/^.* .* //
which seems to do the same thing.
But sadly, the script doesn't work. Accreting it line by line, the first
two move in the right direction. But the third line, /".*,.*"/n, adds
information that I'd thought was discarded. For instance,
"Gregory L. Hansen" <glhansen@iucf.indiana.edu>
becomes
greg Hansen, Gregry L. "Gregory L. Hansen" <glhansen@iucf.indiana.edu>
And it sort of hangs around for the rest of the process. I sure don't
know why.
To burst the commas, I wanted to do something like
s/(.*,.*)/.*1\
..*2/
where .*1 is stuff between the ( and , and .*2 is stuff between the , and
). But as far as I know, sed just doesn't work that way.
Maybe it would help to post the data I'm working on. I didn't do that at
first because I thought it would be annoying, and hoped it wouldn't be
needed. But maybe it would help. I'm trying to get just a single e-mail
address on each line.
steel06$ cat .addressbook
Truebane@aol.com
davidhenry David_Henry@pch.gc.ca
Gus xon@pacbell.net
julie julietobako@visi.com
lithium gs@ix.netcom.com
He told me how to fix my computer when it stopped working.
mom Nancy_M._Hansen@notes.mdor.state.mn.us
varney mvarney@uswest.net
warren WFritze@mrp.com
justin (E-mail), Justin Low [lowjustin@uswest.net]
"Justin Low [lowjustin@uswest.net] (E-mail)" <lowjustin@uswest.net>
AQ Abdulquayuum K.T. Al-Saud AQ@eldjinn.demon.co.uk
adam Adam Szczepaniak aszczepa@indiana.edu
lorraine bagaan, lorraine lorraine bagaan
<lbagaan@yahoo.com>
karl Berkner, Karl Karl Berkner <kberkner4792@attbi.com>
blessinger Blessinger, Christopher
Christopher Blessinger <chblessi@indiana.edu>
boggs Boggs, David A. "David A. Boggs" <daboggs@dodgenet.com>
brenda Brenda Fink ratfink@pacbell.net
chris Chris Hansen chris@bitstream.net
eric Crystal, Eric & Eric & Crystal <c-d-e@comcast.net>
peter Diehr, Peter Peter Diehr <pdiehr@srv2.ic.net>
ken Fischer, Ken Ken Fischer <kfischer@iglou.com>
paul g, apolinario paul apolinario paul g
<u1000684@email.sjsu.edu>
Paul from San Jose
galen Galen Cruze jaraxle@winternet.com
chris2 "Hansen, Chris" "Hansen, Chris" <CHansen@dartadvantage.com>
greg Hansen, Gregory L. "Gregory L. Hansen"
<glhansen@iucf.indiana.edu>
gail Hanson, Gail Gail Hanson <gail@needmore.physics.indiana.edu>
hawking Hawking, Stephen W Stephen W Hawking
<S.W.Hawking@damtp.cam.ac.uk>
tonya Higdon, Tonya Tonya Higdon <kobra@mail.kiva.net>
john John Tobako evl666@juno.com
bob Jr., Robert Langtry "Robert Langtry Jr." <twilight@visi.com>
troy Paton, Troy R Troy R Paton <trp81@juno.com>
holger Reuchlin, Holger Holger Reuchlin
<H.Reuchlin@geopraktiker.com>
wolfgang Rupprecht, Wolfgang S.
"Wolfgang S. Rupprecht" <wolfgang@wsrcc.com>
cathy schmid, cathy cathy schmid <cathyschmid@hotmail.com>
snow Snow, William Michael William Michael Snow
<snow@iucf.indiana.edu>
jo Sulzen, Joanna Joanna Sulzen <josulzen@earthlink.net>
tobako Tobako, John & Julie John & Julie Tobako <j_tobako@juno.com>
town Town, Michael W Michael W Town <mtown@juno.com>
roland Wiley, Roland L. "Roland L. Wiley"
<u1000189@email.sjsu.edu>
minnesota (agg,chris,dave,mom)
dave 'dave' ('dave' <dave1@riptide.wavetech.net>,DAVID.WHITLOCK@usbank.com)
agg AGG (eric,galen,justin,tobako,troy,warren)
mark Mark Templeton (Mark Templeton <templeton_mark@yahoo.com>,
mark@templetonfamily.com,ufs12282@email.sjsu.edu)
sanjose People from San Jose (brenda,gus,lorraince,mark,paul,roland)
--
"What are the possibilities of small but movable machines? They may or
may not be useful, but they surely would be fun to make."
-- Richard P. Feynman, 1959
|
|
0
|
|
|
|
Reply
|
glhansen (396)
|
12/12/2003 5:29:19 PM
|
|
In article <brcttf$irv$1@hood.uits.indiana.edu>,
Gregory L. Hansen <glhansen@steel.ucs.indiana.edu> wrote:
>In article <brcr9m$20663$1@ID-210011.news.uni-berlin.de>,
>Chris F.A. Johnson <c.fa.johnson@rogers.com> wrote:
>>sed 's/.* .* \(.*\)/\1/
Okay, I reread a section in my sed book that I must not have been paying
much attention to. \1 is equal to whatever was matched between the
escaped \( and \). But I'm still not sure what the difference is between
that and
>s/^.* .* //
But a s/^.* .* \(.*\).*/\1/ would cut field 3 from a line with any
number of fields.
>
>To burst the commas, I wanted to do something like
>
>s/(.*,.*)/.*1\
>.*2/
>
>where .*1 is stuff between the ( and , and .*2 is stuff between the , and
>). But as far as I know, sed just doesn't work that way.
I knew wrong.
--
"'No user-serviceable parts inside.' I'll be the judge of that!"
|
|
0
|
|
|
|
Reply
|
glhansen (396)
|
12/13/2003 9:18:19 PM
|
|
|
4 Replies
9 Views
(page loaded in 0.065 seconds)
|