grep/sed - sometimes the easy things seem hard

  • Follow


I've been learning sed and whilst the complicated things seem to work,
some of the simple ones don't.  For example:

I have a file with records of the form:
PMID- 14974909
OWN - NLM
STAT- completed
DA  - 20040225
DCOM- 20040325
IS  - 0300-0664
VI  - 59
IP  - 6
DP  - 2003 Dec
PG  - 690-8
FAU - Nugent, Ailish G
AU  - Nugent AG
FAU - Leung, Kin-Chuen
AU  - Leung KC
FAU - Sullivan, David
AU  - Sullivan D
FAU - Reutens, Anne T
AU  - Reutens AT
FAU - Ho, Ken K Y
AU  - Ho KK
LA  - eng
PT  - Clinical Trial
PT  - Journal Article
PT  - Randomized Controlled Trial
PL  - England
TA  - Clin Endocrinol (Oxf)
JID - 0346653
RN  - 0 (Carrier Proteins)
SO  - Clin Endocrinol (Oxf) 2003 Dec;59(6):690-8.

How can I get sed (or grep), to output only the lines tagged with,
say, AU and PT (in the same order as in the source file).  I though '()'
allowed me to group patterns a la:

cat filename |grep (^AU,^PT)

but, apparently not.
I thought that maybe:

cat filename |sed /^[^PT,^AU]/\!d
might do it, but no.

Also how can I get sed to remove CR/LFs at the end of specific lines?
0
Reply china-rider (63) 7/2/2004 11:29:36 AM

try this:
cat filename | grep -e ^AU -e PT 
0
Reply news9932 (46) 7/2/2004 9:37:35 AM


On Fri, 02 Jul 2004 11:37:35 +0200, Ed wrote:

correction:
cat filename | grep -e ^AU -e ^PT

0
Reply news9932 (46) 7/2/2004 9:38:21 AM

On Fri, 02 Jul 2004 13:29:36 +0200, John Stolz <china-rider@wanadoo.fr> wrote:
| I've been learning sed and whilst the complicated things seem to work,
| some of the simple ones don't.  For example:
|
| I have a file with records of the form:
| PMID- 14974909
| OWN - NLM
| STAT- completed
| DA  - 20040225
| DCOM- 20040325
| IS  - 0300-0664
| VI  - 59
| IP  - 6
| DP  - 2003 Dec
| PG  - 690-8
| FAU - Nugent, Ailish G
| AU  - Nugent AG
| FAU - Leung, Kin-Chuen
| AU  - Leung KC
| FAU - Sullivan, David
| AU  - Sullivan D
| FAU - Reutens, Anne T
| AU  - Reutens AT
| FAU - Ho, Ken K Y
| AU  - Ho KK
| LA  - eng
| PT  - Clinical Trial
| PT  - Journal Article
| PT  - Randomized Controlled Trial
| PL  - England
| TA  - Clin Endocrinol (Oxf)
| JID - 0346653
| RN  - 0 (Carrier Proteins)
| SO  - Clin Endocrinol (Oxf) 2003 Dec;59(6):690-8.
|
| How can I get sed (or grep), to output only the lines tagged with,
| say, AU and PT (in the same order as in the source file).  I though '()'
| allowed me to group patterns a la:
|
| cat filename |grep (^AU,^PT)


Almost there. Try:

  grep -E '^AU|^PT' filename

also 'egrep' is shorthand for 'grep -E'

The pipe '|' means 'or', and the bracket do group the expression. You need the
quotes to stop the shell seeing the '|' character.

You don't need the grouping for this simple expression, but you could do:

  grep -E '^(AU|PT)' filename


| but, apparently not.
| I thought that maybe:
|
| cat filename |sed /^[^PT,^AU]/\!d
| might do it, but no.
|
| Also how can I get sed to remove CR/LFs at the end of specific lines?


Not sure how to do this is sed, sorry.


-- 
Reverend Paul Colquhoun, ULC.    http://andor.dropbear.id.au/~paulcol
     Asking for technical help in newsgroups?  Read this first:
        http://catb.org/~esr/faqs/smart-questions.html#intro
0
Reply postmaster5 (179) 7/2/2004 11:20:01 AM

> | PMID- 14974909
> | OWN - NLM
> | STAT- completed
> | DA  - 20040225
> | DCOM- 20040325
> | IS  - 0300-0664
> | VI  - 59
> | IP  - 6
> | DP  - 2003 Dec
> | PG  - 690-8
> | FAU - Nugent, Ailish G
> | AU  - Nugent AG
> | FAU - Leung, Kin-Chuen
> | AU  - Leung KC
> | FAU - Sullivan, David
> | AU  - Sullivan D
> | FAU - Reutens, Anne T
> | AU  - Reutens AT
> | FAU - Ho, Ken K Y
> | AU  - Ho KK
> | LA  - eng
> | PT  - Clinical Trial
> | PT  - Journal Article
> | PT  - Randomized Controlled Trial
> | PL  - England
> | TA  - Clin Endocrinol (Oxf)
> | JID - 0346653
> | RN  - 0 (Carrier Proteins)
> | SO  - Clin Endocrinol (Oxf) 2003 Dec;59(6):690-8.
> |
> | How can I get sed (or grep), to output only the lines tagged with,
> | say, AU and PT (in the same order as in the source file).  I though '()'
> | allowed me to group patterns a la:
> |
> | cat filename |grep (^AU,^PT)
>
>
> Almost there. Try:
>
>   grep -E '^AU|^PT' filename

I like the grep solution but
cat filename | grep -E '^AU  -|^PT  -'
is more appropriate unless you know more about the tags than I do.

Regards...Dan.


0
Reply JDanSkinner (96) 7/2/2004 4:33:29 PM

On 2004-07-02, Dan Skinner wrote:
>
> cat filename | grep -E '^AU  -|^PT  -'
> is more appropriate unless you know more about the tags than I do.

    There's no need for cat:

grep -E '^AU  -|^PT  -' filename

-- 
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License
0
Reply cfajohnson (1783) 7/2/2004 6:02:10 PM

On Fri, 02 Jul 2004 11:38:21 +0200, Ed hath writ:
> On Fri, 02 Jul 2004 11:37:35 +0200, Ed wrote:
>
> correction:
> cat filename | grep -e ^AU -e ^PT

  *Correct* correction:

  grep -e ^AU -e ^PT filename

Jonesy
0
Reply bit-bucket (345) 7/3/2004 9:42:52 PM

On Fri, 02 Jul 2004 18:02:10 +0000, Chris F.A. Johnson wrote:

> On 2004-07-02, Dan Skinner wrote:
>>
>> cat filename | grep -E '^AU  -|^PT  -'
>> is more appropriate unless you know more about the tags than I do.
> 
>     There's no need for cat:
> 
> grep -E '^AU  -|^PT  -' filename

Thanks everyone - this works a treat now.


0
Reply china-rider (63) 7/5/2004 9:57:35 AM

7 Replies
51 Views

(page loaded in 0.121 seconds)

Similiar Articles:










7/16/2012 10:49:10 AM


Reply: