Greetings,
Given this input file:
tst.txt:
ABCDEFG%8921%9251%0003,201004,201004
ABCDEFG%9351%2951%0004,201004,201004
ABCDEFG%6951%3951%0005,201004,201004
ABCDEFG%4951%9941%0006,201004,201004
ABCDEFG%9954%8954%0007,201004,201004
ABCDEFG%4951%1951%0008,201004,201004
My desired output would be:
ABCDEFG%08921%09251%0003,201004,201004
ABCDEFG%09351%02951%0004,201004,201004
ABCDEFG%06951%03951%0005,201004,201004
ABCDEFG%04951%09941%0006,201004,201004
ABCDEFG%09954%08954%0007,201004,201004
ABCDEFG%04951%01951%0008,201004,201004
which would 0-pad any non-zero four digit number.
However, this command:
sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
will correctly 0-pad ONLY the first set of non-zero numbers, resulting
in this output:
ABCDEFG%08921%9251%0003,201004,201004
ABCDEFG%09351%2951%0004,201004,201004
ABCDEFG%06951%3951%0005,201004,201004
ABCDEFG%04951%9941%0006,201004,201004
ABCDEFG%09954%8954%0007,201004,201004
ABCDEFG%04951%1951%0008,201004,201004
The question is, even with the "g" option for global, why does the
above command only change the first occurrence?
Morever, what is a better way to get my desired result using sed?
Thanks in advance.
|
|
0
|
|
|
|
Reply
|
jaredsubman (13)
|
2/24/2010 5:55:17 PM |
|
* jaredsubman@yahoo.com [2010.02.24 17:55]:
> Given this input file:
> tst.txt:
> ABCDEFG%8921%9251%0003,201004,201004
[...]
> My desired output would be:
>
> ABCDEFG%08921%09251%0003,201004,201004
[...]
> However, this command:
> sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>
> will correctly 0-pad ONLY the first set of non-zero numbers, resulting
> in this output:
>
> ABCDEFG%08921%9251%0003,201004,201004
[...]
> The question is, even with the "g" option for global, why does the
> above command only change the first occurrence?
Because you included both '%' signs in your pattern. If the
second one matches, the search for the next occurence of the
pattern starts *after* it. Removing the second '%' in your
pattern fixes that problem, but I don't know if the
resulting command (below) is robust enough to process your
data correctly:
sed -e 's:%\([1-9][0-9][0-9][0-9]\):%0\1:g' tst.txt
--
JR
|
|
0
|
|
|
|
Reply
|
Jean
|
2/24/2010 6:11:58 PM
|
|
jaredsubman@yahoo.com wrote:
> Greetings,
>
> Given this input file:
> tst.txt:
> ABCDEFG%8921%9251%0003,201004,201004
> ABCDEFG%9351%2951%0004,201004,201004
> ABCDEFG%6951%3951%0005,201004,201004
> ABCDEFG%4951%9941%0006,201004,201004
> ABCDEFG%9954%8954%0007,201004,201004
> ABCDEFG%4951%1951%0008,201004,201004
>
> My desired output would be:
>
> ABCDEFG%08921%09251%0003,201004,201004
> ABCDEFG%09351%02951%0004,201004,201004
> ABCDEFG%06951%03951%0005,201004,201004
> ABCDEFG%04951%09941%0006,201004,201004
> ABCDEFG%09954%08954%0007,201004,201004
> ABCDEFG%04951%01951%0008,201004,201004
>
> which would 0-pad any non-zero four digit number.
>
> However, this command:
> sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>
> will correctly 0-pad ONLY the first set of non-zero numbers, resulting
> in this output:
>
> ABCDEFG%08921%9251%0003,201004,201004
> ABCDEFG%09351%2951%0004,201004,201004
> ABCDEFG%06951%3951%0005,201004,201004
> ABCDEFG%04951%9941%0006,201004,201004
> ABCDEFG%09954%8954%0007,201004,201004
> ABCDEFG%04951%1951%0008,201004,201004
>
> The question is, even with the "g" option for global, why does the
> above command only change the first occurrence?
> Morever, what is a better way to get my desired result using sed?
>
> Thanks in advance.
Your command works for me. Using GNU sed version 3.02.80
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
2/24/2010 6:16:31 PM
|
|
On Feb 24, 11:55=A0am, "jaredsub...@yahoo.com" <jaredsub...@yahoo.com>
wrote:
> Greetings,
>
> Given this input file:
> tst.txt:
> ABCDEFG%8921%9251%0003,201004,201004
> ABCDEFG%9351%2951%0004,201004,201004
> ABCDEFG%6951%3951%0005,201004,201004
> ABCDEFG%4951%9941%0006,201004,201004
> ABCDEFG%9954%8954%0007,201004,201004
> ABCDEFG%4951%1951%0008,201004,201004
>
> My desired output would be:
>
> ABCDEFG%08921%09251%0003,201004,201004
> ABCDEFG%09351%02951%0004,201004,201004
> ABCDEFG%06951%03951%0005,201004,201004
> ABCDEFG%04951%09941%0006,201004,201004
> ABCDEFG%09954%08954%0007,201004,201004
> ABCDEFG%04951%01951%0008,201004,201004
>
> which would 0-pad any non-zero four digit number.
>
> However, this command:
> sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>
> will correctly 0-pad ONLY the first set of non-zero numbers, resulting
> in this output:
>
> ABCDEFG%08921%9251%0003,201004,201004
> ABCDEFG%09351%2951%0004,201004,201004
> ABCDEFG%06951%3951%0005,201004,201004
> ABCDEFG%04951%9941%0006,201004,201004
> ABCDEFG%09954%8954%0007,201004,201004
> ABCDEFG%04951%1951%0008,201004,201004
>
> The question is, even with the "g" option for global, why does the
> above command only change the first occurrence?
> Morever, what is a better way to get my desired result using sed?
>
> Thanks in advance.
You got the answers to your specific questions but I don't think
anyone's given you a robust sed solution yet so in the meantime you
might want to try this:
awk 'BEGIN{ FS=3DOFS=3D"%"; fmt=3D"%05s" }
{
for (fldNr=3D1; fldNr<=3DNF; fldNr++) {
fld =3D sep =3D ""
numSubFlds =3D split($fldNr,fldArr,",")
for (subNr=3D1; subNr<=3DnumSubFlds; subNr++) {
fld =3D fld sep sprintf(fmt,fldArr[subNr])
sep =3D ","
}
$fldNr =3D fld
}
print
}' tst.txt
Ed
|
|
0
|
|
|
|
Reply
|
Ed
|
2/24/2010 7:34:14 PM
|
|
On Feb 24, 1:34=A0pm, Ed Morton <mortons...@gmail.com> wrote:
> On Feb 24, 11:55=A0am, "jaredsub...@yahoo.com" <jaredsub...@yahoo.com>
> wrote:
>
>
>
>
>
> > Greetings,
>
> > Given this input file:
> > tst.txt:
> > ABCDEFG%8921%9251%0003,201004,201004
> > ABCDEFG%9351%2951%0004,201004,201004
> > ABCDEFG%6951%3951%0005,201004,201004
> > ABCDEFG%4951%9941%0006,201004,201004
> > ABCDEFG%9954%8954%0007,201004,201004
> > ABCDEFG%4951%1951%0008,201004,201004
>
> > My desired output would be:
>
> > ABCDEFG%08921%09251%0003,201004,201004
> > ABCDEFG%09351%02951%0004,201004,201004
> > ABCDEFG%06951%03951%0005,201004,201004
> > ABCDEFG%04951%09941%0006,201004,201004
> > ABCDEFG%09954%08954%0007,201004,201004
> > ABCDEFG%04951%01951%0008,201004,201004
>
> > which would 0-pad any non-zero four digit number.
>
> > However, this command:
> > sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>
> > will correctly 0-pad ONLY the first set of non-zero numbers, resulting
> > in this output:
>
> > ABCDEFG%08921%9251%0003,201004,201004
> > ABCDEFG%09351%2951%0004,201004,201004
> > ABCDEFG%06951%3951%0005,201004,201004
> > ABCDEFG%04951%9941%0006,201004,201004
> > ABCDEFG%09954%8954%0007,201004,201004
> > ABCDEFG%04951%1951%0008,201004,201004
>
> > The question is, even with the "g" option for global, why does the
> > above command only change the first occurrence?
> > Morever, what is a better way to get my desired result using sed?
>
> > Thanks in advance.
>
> You got the answers to your specific questions but I don't think
> anyone's given you a robust sed solution yet so in the meantime you
> might want to try this:
>
> awk 'BEGIN{ FS=3DOFS=3D"%"; fmt=3D"%05s" }
> {
> =A0 =A0 for (fldNr=3D1; fldNr<=3DNF; fldNr++) {
> =A0 =A0 =A0 =A0 fld =3D sep =3D ""
> =A0 =A0 =A0 =A0 numSubFlds =3D split($fldNr,fldArr,",")
> =A0 =A0 =A0 =A0 for (subNr=3D1; subNr<=3DnumSubFlds; subNr++) {
> =A0 =A0 =A0 =A0 =A0 =A0 fld =3D fld sep sprintf(fmt,fldArr[subNr])
> =A0 =A0 =A0 =A0 =A0 =A0 sep =3D ","
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 $fldNr =3D fld
> =A0 =A0 }
> =A0 =A0 print
>
> }' tst.txt
>
> =A0 =A0 =A0 =A0 Ed- Hide quoted text -
>
> - Show quoted text -
Hang on, I just noticed that you DON'T want the 4-digit strings at the
end of your input padded with leading zeros. That makes things much
simpler:
awk 'BEGIN{ FS=3DOFS=3D"%" } {
for (i=3D1; i<=3DNF; i++)
$i=3Dsprintf("%05s",$i)
}1' tst.txt
or to make sure you only operate on strings of all-digits:
awk 'BEGIN{ FS=3DOFS=3D"%" } {
for (i=3D1; i<=3DNF; i++)
if ($i ~ /^[0-9]+$/)
$i =3D sprintf("%05s",$i)
}1' tst.txt
or to ONLY pad 4-digit numbers:
awk 'BEGIN{ FS=3DOFS=3D"%" } {
for (i=3D1; i<=3DNF; i++)
if ($i ~ /^[0-9][0-9][0-9][0-9]$/)
$i =3D "0"$i
}1' tst.txt
The "sprintf()" solutions will pad any chains of 4 _or less_ digits.
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
2/24/2010 7:47:17 PM
|
|
On Feb 24, 10:55=A0pm, "jaredsub...@yahoo.com" <jaredsub...@yahoo.com>
wrote:
> Greetings,
>
> Given this input file:
> tst.txt:
> ABCDEFG%8921%9251%0003,201004,201004
> ABCDEFG%9351%2951%0004,201004,201004
> ABCDEFG%6951%3951%0005,201004,201004
> ABCDEFG%4951%9941%0006,201004,201004
> ABCDEFG%9954%8954%0007,201004,201004
> ABCDEFG%4951%1951%0008,201004,201004
>
> My desired output would be:
>
> ABCDEFG%08921%09251%0003,201004,201004
> ABCDEFG%09351%02951%0004,201004,201004
> ABCDEFG%06951%03951%0005,201004,201004
> ABCDEFG%04951%09941%0006,201004,201004
> ABCDEFG%09954%08954%0007,201004,201004
> ABCDEFG%04951%01951%0008,201004,201004
>
sed -e '
; # requirements:
; #1 only a 4-digit number considered. So any 3/2/1 digit numbers must
be excluded
; #2 that 4-digit number mustnt begin with a zero, i.e., mustnt
already be padded
; #3 ^num$ ^num| |num| |num$, where num is a 4-digit number, | means a
nondigit
:loop
; ## ^num$
/^[1-9][0-9][0-9][0-9]$/{
s/^/0/;b
}
## ^num|
/^[1-9][0-9][0-9][0-9][^0-9]/{
s/^/0/;bloop
}
## |num|
/\([^0-9]\)\([1-9][0-9][0-9][0-9]\)\([^0-9]\)/{
s//\10\2\3/;bloop
}
## |num$
/\([^0-9]\)\([1-9][0-9][0-9][0-9]\)$/{
s//\10\2/;bloop
}
' yourfile
|
|
0
|
|
|
|
Reply
|
Rakesh
|
2/26/2010 8:54:19 AM
|
|
2010-02-24, 09:55(-08), jaredsubman@yahoo.com:
[...]
> Given this input file:
> tst.txt:
> ABCDEFG%8921%9251%0003,201004,201004
[...]
> sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>
> will correctly 0-pad ONLY the first set of non-zero numbers, resulting
> in this output:
>
> ABCDEFG%08921%9251%0003,201004,201004
[...]
That's what sed's "t" command is for:
sed -e :1 -e 's/%\([1-9][0-9]\{3\}%\)/%0\1/g;t1' tst.txt
repeat the operation until "s" no longer succeeds to replace.
--
St�phane
|
|
0
|
|
|
|
Reply
|
Stephane
|
2/27/2010 7:54:34 PM
|
|
Stephane CHAZELAS <stephane_chazelas@yahoo.fr> writes:
> 2010-02-24, 09:55(-08), jaredsubman@yahoo.com:
> [...]
>> Given this input file:
>> tst.txt:
>> ABCDEFG%8921%9251%0003,201004,201004
> [...]
>> sed -e 's:%\([1-9][0-9][0-9][0-9]\)%:%0\1%:g' tst.txt
>>
>> will correctly 0-pad ONLY the first set of non-zero numbers, resulting
>> in this output:
The 'g' causes all non-overlapping occurrences to be replaced, but
because the pattern include both the initial % and the trailing %, the
effect would be to replace every other non-initial-zero four-digit
number.
>> ABCDEFG%08921%9251%0003,201004,201004
> [...]
>
> That's what sed's "t" command is for:
>
> sed -e :1 -e 's/%\([1-9][0-9]\{3\}%\)/%0\1/g;t1' tst.txt
>
> repeat the operation until "s" no longer succeeds to replace.
It might be a little clearer to drop the 'g' and rely solely on the
looping, but it is harmless.
--
Ben.
|
|
0
|
|
|
|
Reply
|
Ben
|
2/27/2010 9:32:39 PM
|
|
|
7 Replies
745 Views
(page loaded in 0.114 seconds)
Similiar Articles: 0-padding 4 digit numbers with sed - comp.unix.shellGreetings, Given this input file: tst.txt: ABCDEFG%8921%9251%0003,201004,201004 ABCDEFG%9351%2951%0004,201004,201004 ABCDEFG%6951%3951%0005,201... Number padding... (trailing zeros'.) - comp.lang.perl.misc ...0-padding 4 digit numbers with sed - comp.unix.shell... both the initial % and the trailing %, the effect would be to replace every other non-initial-zero four-digit ... Remove left and top padding from IFRAME in a cell? - comp.lang ...0-padding 4 digit numbers with sed - comp.unix.shell... 04951%01951%0008,201004,201004 which would 0-pad ... Removing the second '%' in your pattern fixes that problem ... extracting a particular pattern from a line - comp.unix.shell ...0-padding 4 digit numbers with sed - comp.unix.shell extracting a particular pattern from a line - comp.unix.shell ... 0-padding 4 digit numbers with sed - comp.unix.shell ... Question using sed replace text file \t - comp.unix.shell ...0-padding 4 digit numbers with sed - comp.unix.shell... result using sed? > > > Thanks in advance. > > You got the answers to your specific questions but I don't ... padded zeros in columns - comp.soft-sys.matlabLeading Zeros - comp.soft-sys.sas In one column i have values of differrent length but the column length is 5 now i ... 0-padding 4 digit numbers with sed - comp.unix ... count leading zero - comp.lang.asm.x860-padding 4 digit numbers with sed - comp.unix.shell count leading zero - comp.lang.asm.x86 0-padding 4 digit numbers with sed - comp.unix.shell count leading zero - comp ... Extract specific text from file using sed - comp.unix.shell ...0-padding 4 digit numbers with sed - comp.unix.shell You got the answers to your specific ... A0 2-up > =A0 =A0 4-over > sed -ne 's|.*\([0-9 ... extract phone ... part ... Trim file with sed - comp.unix.programmerIn one column i have ... Determine if the value is numeric (the trim and notdigit ... 0-padding 4 digit numbers with sed - comp.unix.shell ... Numbers formatted with leading zeros? - comp.databases.filemaker ...count leading zero - comp.lang.asm.x86... digit numbers with sed - comp.unix.shell count leading zero - comp.lang.asm.x86 0-padding 4 digit numbers with sed - comp.unix ... Re: 0-padding 4 digit numbers with sed - Der Keiler UNIX: The ...jaredsubman@xxxxxxxxx wrote: Greetings, Given this input file: tst.txt: ABCDEFG%8921%9251%0003,201004,201004 ABCDEFG%9351%2951%0004,201004,201004 0-padding 4 digit numbers with sed - comp.unix.shell | Computer GroupGreetings, Given this input file: tst.txt: ABCDEFG%8921%9251%0003,201004,201004 ABCDEFG%9351%2951%0004,201004,201004 ABCDEFG%6951%3951%0005,201... 7/23/2012 1:26:27 PM
|