im not a very good awk user, but i use it a lot to parse out data in
graphic filenames. usually the filenames have _ separators (i.e.,
ef_001_comp_01.0001.tif), so something like:
set VAR = `echo $ef_001_comp_01.0001.tif | awk -F_ '{print $1}'`
gives me:
echo $VAR
ef
however now i have a bunch of filenames in this form:
ef0001.0001.tif
sb0001.0001.tif
rs0004.0001.tif
etc.
i can isolate the first section and get:
ef0001
sb0001
rs0004
which gives me the two pieces of information i want together ( first
two alphas are one thing, the next four numbers are another). how do
i parse them out? there's no consistent character to awk (in my
limited ability).
tia
christopher
|
|
0
|
|
|
|
Reply
|
deepstructure (3)
|
4/19/2007 10:35:33 PM |
|
o.k., reading my awk/sed o'reilly book i realize that the -F option
just changes what the separator is, and since nothing separates these
things in the string i probably can't use that. at least the split
happens between alpha and numeric characters. i keep seeing things
like
/[A-Za-z]+/
in the book, but so far haven't been able to figure out how to
implement that. again, any help is appreciated!
cheers
christopher
|
|
0
|
|
|
|
Reply
|
deepstructure
|
4/19/2007 11:34:27 PM
|
|
deepstructure@gmail.com wrote:
> im not a very good awk user, but i use it a lot to parse out data in
> graphic filenames. usually the filenames have _ separators (i.e.,
> ef_001_comp_01.0001.tif), so something like:
>
> set VAR = `echo $ef_001_comp_01.0001.tif | awk -F_ '{print $1}'`
(What syntax is that; C shell? Completely off-topic here.)
>
> gives me:
>
> echo $VAR
>
> ef
>
> however now i have a bunch of filenames in this form:
>
> ef0001.0001.tif
> sb0001.0001.tif
> rs0004.0001.tif
>
> etc.
>
> i can isolate the first section and get:
>
> ef0001
> sb0001
> rs0004
awk '{sub(/\..*/,""); print $0}'
Substitute the data (in $0) starting from a literal dot \. followed by
an arbitrary amount of any character .* by the empty string "", then
print the data.
Janis
>
> which gives me the two pieces of information i want together ( first
> two alphas are one thing, the next four numbers are another). how do
> i parse them out? there's no consistent character to awk (in my
> limited ability).
>
> tia
>
> christopher
>
|
|
0
|
|
|
|
Reply
|
Janis
|
4/20/2007 12:31:00 AM
|
|
deepstructure@gmail.com wrote:
> o.k., reading my awk/sed o'reilly book i realize that the -F option
> just changes what the separator is, and since nothing separates these
> things in the string i probably can't use that. at least the split
> happens between alpha and numeric characters. i keep seeing things
> like
>
> /[A-Za-z]+/
>
> in the book, but so far haven't been able to figure out how to
> implement that. again, any help is appreciated!
What is your _concrete_ question in *this* posting?
You want to know the meaning of the above regexp?
It matches an arbitrary long sequence (at least one character) of
alpha characters.
Janis
>
> cheers
> christopher
>
>
|
|
0
|
|
|
|
Reply
|
Janis
|
4/20/2007 12:34:05 AM
|
|
On Apr 19, 5:31 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
wrote:
>
> (What syntax is that; C shell? Completely off-topic here.)
>
yep. yes, i know im not supposed to be using c shell but
unfortunately that's what i learned in and am currently stuck with.
my second post was just me realizing that using -F wasn't going to
help since there isn't a separator.
unfortunately i don't see this statement working:
awk '{sub(/\..*/,""); print $0}'
i think you misunderstood what i was trying to parse. im trying to
separate the "ef" and "0001" in "ef0001". i tried your code and got
this:
% set VAR = `echo ef0015 | awk '{sub(/\..*/,""); print $0}'`
% echo $VAR
ef0015
so ignore the first part of what i wrote above - that was just
context. basically i have a list of filenames:
ef0001
ef0002
rb0001
sr0004
etc., etc., that i need to separate the two-alpha character beginning
of and the numeric portion of. make sense?
thanks for you help!
cheers
christopher
|
|
0
|
|
|
|
Reply
|
deepstructure
|
4/20/2007 1:55:33 AM
|
|
* deepstructure@gmail.com [2007.04.20 01:55]:
> so ignore the first part of what i wrote above - that was just
> context. basically i have a list of filenames:
>
> ef0001
> ef0002
> rb0001
> sr0004
>
> etc., etc., that i need to separate the two-alpha character beginning
> of and the numeric portion of. make sense?
If this is fixed width, you can use substr():
awk '{ print substr($0,1,2), substr($0,3,4) }'
--
JR
|
|
0
|
|
|
|
Reply
|
Jean
|
4/20/2007 2:48:41 AM
|
|
On Apr 20, 9:55 am, deepstruct...@gmail.com wrote:
> On Apr 19, 5:31 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
> wrote:
>
>
>
> > (What syntax is that; C shell? Completely off-topic here.)
>
> yep. yes, i know im not supposed to be using c shell but
> unfortunately that's what i learned in and am currently stuck with.
>
> my second post was just me realizing that using -F wasn't going to
> help since there isn't a separator.
>
> unfortunately i don't see this statement working:
>
> awk '{sub(/\..*/,""); print $0}'
>
> i think you misunderstood what i was trying to parse. im trying to
> separate the "ef" and "0001" in "ef0001". i tried your code and got
> this:
>
> % set VAR = `echo ef0015 | awk '{sub(/\..*/,""); print $0}'`
> % echo $VAR
> ef0015
>
> so ignore the first part of what i wrote above - that was just
> context. basically i have a list of filenames:
>
> ef0001
> ef0002
> rb0001
> sr0004
>
> etc., etc., that i need to separate the two-alpha character beginning
> of and the numeric portion of. make sense?
>
> thanks for you help!
>
> cheers
> christopher
awk '{ num = gensub(/[a-z]+/,"","g"); print num
alpha = gensub(/[0-9]+/,"","g") ; print alpha
}
' "file"
|
|
0
|
|
|
|
Reply
|
mik3l3374
|
4/20/2007 2:55:51 AM
|
|
On 20 Apr., 04:48, Jean-Rene David <jrda...@magma.ca.INVALID> wrote:
> * deepstruct...@gmail.com [2007.04.20 01:55]:
>
> > so ignore the first part of what i wrote above - that was just
> > context. basically i have a list of filenames:
>
> > ef0001
> > ef0002
> > rb0001
> > sr0004
>
> > etc., etc., that i need to separate the two-alpha character beginning
> > of and the numeric portion of. make sense?
>
> If this is fixed width, you can use substr():
>
> awk '{ print substr($0,1,2), substr($0,3,4) }'
And if it's not fixed width you can use match() first to obtain the
required indices.
Janis
>
> --
> JR
|
|
0
|
|
|
|
Reply
|
Janis
|
4/20/2007 10:16:52 AM
|
|
mik3l3374@gmail.com wrote:
> On Apr 20, 9:55 am, deepstruct...@gmail.com wrote:
>
>>On Apr 19, 5:31 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
>>wrote:
>>
>>
>>
>>
>>>(What syntax is that; C shell? Completely off-topic here.)
>>
>>yep. yes, i know im not supposed to be using c shell but
>>unfortunately that's what i learned in and am currently stuck with.
>>
>>my second post was just me realizing that using -F wasn't going to
>>help since there isn't a separator.
>>
>>unfortunately i don't see this statement working:
>>
>>awk '{sub(/\..*/,""); print $0}'
>>
>>i think you misunderstood what i was trying to parse. im trying to
>>separate the "ef" and "0001" in "ef0001". i tried your code and got
>>this:
>>
>>% set VAR = `echo ef0015 | awk '{sub(/\..*/,""); print $0}'`
>>% echo $VAR
>>ef0015
>>
>>so ignore the first part of what i wrote above - that was just
>>context. basically i have a list of filenames:
>>
>>ef0001
>>ef0002
>>rb0001
>>sr0004
>>
>>etc., etc., that i need to separate the two-alpha character beginning
>>of and the numeric portion of. make sense?
>>
>>thanks for you help!
>>
>>cheers
>>christopher
>
>
> awk '{ num = gensub(/[a-z]+/,"","g"); print num
> alpha = gensub(/[0-9]+/,"","g") ; print alpha
> }
> ' "file"
>
To separate the 2 parts by a newline:
awk 'sub(/[a-z]+/,"&\n")'
To get just the alpha part:
awk 'sub(/[0-9]+/,"")'
To get just the numeric part:
awk 'sub(/[a-z]+/,"")'
but there's a much better way to do it in shell without using awk. If
you're interested, post to comp.unix.shell.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
4/20/2007 1:10:14 PM
|
|
On Apr 20, 6:10 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> mik3l3...@gmail.com wrote:
> > On Apr 20, 9:55 am, deepstruct...@gmail.com wrote:
>
> >>On Apr 19, 5:31 pm, Janis Papanagnou <Janis_Papanag...@hotmail.com>
> >>wrote:
>
> >>>(What syntax is that; C shell? Completely off-topic here.)
>
> >>yep. yes, i know im not supposed to be using c shell but
> >>unfortunately that's what i learned in and am currently stuck with.
>
> >>my second post was just me realizing that using -F wasn't going to
> >>help since there isn't a separator.
>
> >>unfortunately i don't see this statement working:
>
> >>awk '{sub(/\..*/,""); print $0}'
>
> >>i think you misunderstood what i was trying to parse. im trying to
> >>separate the "ef" and "0001" in "ef0001". i tried your code and got
> >>this:
>
> >>% set VAR = `echo ef0015 | awk '{sub(/\..*/,""); print $0}'`
> >>% echo $VAR
> >>ef0015
>
> >>so ignore the first part of what i wrote above - that was just
> >>context. basically i have a list of filenames:
>
> >>ef0001
> >>ef0002
> >>rb0001
> >>sr0004
>
> >>etc., etc., that i need to separate the two-alpha character beginning
> >>of and the numeric portion of. make sense?
>
> >>thanks for you help!
>
> >>cheers
> >>christopher
>
> > awk '{ num = gensub(/[a-z]+/,"","g"); print num
> > alpha = gensub(/[0-9]+/,"","g") ; print alpha
> > }
> > ' "file"
>
> To separate the 2 parts by a newline:
>
> awk 'sub(/[a-z]+/,"&\n")'
>
> To get just the alpha part:
>
> awk 'sub(/[0-9]+/,"")'
>
> To get just the numeric part:
>
> awk 'sub(/[a-z]+/,"")'
>
> but there's a much better way to do it in shell without using awk. If
> you're interested, post to comp.unix.shell.
>
> Ed.
hey ed, that works really well. thank you! i took your advice and
posted to the unix.shell group also, but your solution will do the
trick.
and thanks to everyone else who contributed - tho some of those
solutions were more technical than i could handle!
|
|
0
|
|
|
|
Reply
|
deepstructure
|
4/20/2007 5:44:28 PM
|
|
|
9 Replies
542 Views
(page loaded in 0.077 seconds)
|