It took me some time to understand, why awks arrays cannot easily been
sorted. Anyway, somewhere I got the hint to use unix shells sort by
dumping the hash, sorting it in the unix shell and reading it in
again. So I wrote a little functional capsule for that. here it is...
#
# print an array in sorted order by value
# 20091022, Johannes Mainusch
# don't blame me if it doesn't work on you
# :-)
#
function print_sorted_by_value (prefix, array, scale, norm,
significance, i, sum, tmpfile, cmd) {
printf ("\nprinting in sorted order by value\n");
for (i in array) n++; # get length of the array
tmpfile=sprintf("del.me.%d",1000000*rand());
#print "filename = ",tmpfile;
sum = 0;
for (key in array) {
value = "nan";
sum += array[key];
if (array[key] > significance) value = 100*array[key]/norm;
printf ("%s%-30s %8.1f %f\n", prefix, key, scale*array[key], value)
>>tmpfile;
}
close (tmpfile);
delete array;
cmd = sprintf ("sort -n -r -k3 %s", tmpfile);
while (cmd | getline myline) {
print myline;
# split (myline, tmp);
# array[tmp[2]]=tmp[1];
}
close (tmpfile);
system ("rm "tmpfile);
printf ("%s%-30s %8.1f\n", prefix, "sum:", sum);
printf ("-----------------------------------------\n");
}
|
|
0
|
|
|
|
Reply
|
johannes
|
10/23/2009 1:49:25 PM |
|
johannes.mainusch wrote:
> It took me some time to understand, why awks arrays cannot easily been
> sorted. Anyway, somewhere I got the hint to use unix shells sort by
> dumping the hash, sorting it in the unix shell and reading it in
> again. So I wrote a little functional capsule for that. here it is...
>
> #
> # print an array in sorted order by value
> # 20091022, Johannes Mainusch
> # don't blame me if it doesn't work on you
> # :-)
> #
> function print_sorted_by_value (prefix, array, scale, norm,
> significance, i, sum, tmpfile, cmd) {
> printf ("\nprinting in sorted order by value\n");
> for (i in array) n++; # get length of the array
>
> tmpfile=sprintf("del.me.%d",1000000*rand());
> #print "filename = ",tmpfile;
>
> sum = 0;
> for (key in array) {
> value = "nan";
> sum += array[key];
> if (array[key] > significance) value = 100*array[key]/norm;
> printf ("%s%-30s %8.1f %f\n", prefix, key, scale*array[key], value)
>>> tmpfile;
> }
> close (tmpfile);
> delete array;
>
> cmd = sprintf ("sort -n -r -k3 %s", tmpfile);
> while (cmd | getline myline) {
> print myline;
> # split (myline, tmp);
> # array[tmp[2]]=tmp[1];
> }
> close (tmpfile);
> system ("rm "tmpfile);
>
> printf ("%s%-30s %8.1f\n", prefix, "sum:", sum);
> printf ("-----------------------------------------\n");
> }
A couple of things pop out: you need to add "n" to the pseudo-argument
list, the use of getline isn't a safe syntax (see
http://awk.info/?tip/getline), you'd need to run it from a dir where you
have write permission so, since you're assuming UNIX, put your tmp file
in /usr/tmp or similar, deleting a whole array is a gawk-ism but gawk
already has built in array sorting (asort() and asorti()), no need to
use sprintf() to create the "tmpfile" and "cmd" strings, you could use a
co-process instead of a tmp file if you're assuming gawk, you could use
length() instead of a loop to get the array size if you're assuming
gawk, instead of repeating the same format string in two printfs you
should define a format variable and use that, and all the trailing
semicolons are redundant.
Could you show some sample input, a small script that uses that function
plus the output it produces so we can see how to use it?
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
10/23/2009 2:06:29 PM
|
|
In article <c33c3055-7f83-4f24-acf8-f5a14dee6f29@o36g2000vbl.googlegroups.com>,
johannes.mainusch <johannes.mainusch@gmx.de> wrote:
>It took me some time to understand, why awks arrays cannot easily been
>sorted. Anyway, somewhere I got the hint to use unix shells sort by
>dumping the hash, sorting it in the unix shell and reading it in
>again. So I wrote a little functional capsule for that. here it is...
A couple of notes (objections):
1) The need for this is pretty much obsolete today, given the built-in
sorting capabilities of GAWK and TAWK (and if you're not using one
or the other of these, then you really should be).
2) IME, you rarely need to sort the *values*. My applications have
always been the need for sorting the keys. TAWK does this
automatically, of course, as does GAWK if you're a sufficiently
whiny user (hint, hint).
3) Incidentally, I've never used GAWK's asort() or asorti() functions.
They look somewhat interesting, but I've never seen the need...
|
|
0
|
|
|
|
Reply
|
gazelle
|
10/23/2009 2:09:38 PM
|
|
Kenny McCormack wrote:
<snip>
> 3) Incidentally, I've never used GAWK's asort() or asorti() functions.
> They look somewhat interesting, but I've never seen the need...
I don't use them much but once in a while they're useful. In fact, I
used one of them in a script just yesterday. I had multiple files of
measurements for various types of processor, e.g. this kind of format in
file "FILE1":
type=foo id=3
count1 = 7
count2 = 5
type=bar id=54
count1 = 3
count3 = 6
type=foo id=12
count4 = 5
count2 = 9
and I had to produce tabular output that was sorted by processor type+id
and with a blank line between each type:
FILE1:
bar_54 3 0 6 0
foo_03 7 5 0 0
foo_12 0 9 0 5
FILE2:
....
so it was convenient to initially store the data indexed by processor
type+id, then sort the list using asorti() before printing. I could've
piped an interim result per file to UNIX sort but then I'd have had to
add yet another pipe to a second awk to introduce the blank lines
between processor types and I'd have had to introduce a shell loop to
feed awk one file at a time instead of just handling all the files on
the awk command line or otherwise jump through hoops so just using
asorti() in a single script was quite a bit simpler.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
10/23/2009 2:24:21 PM
|
|
johannes.mainusch wrote:
> It took me some time to understand, why awks arrays cannot easily been
> sorted.
Why? AFAICT, an array sort function can be written in awk just as easily as
in any other language.
> Anyway, somewhere I got the hint to use unix shells sort by
> dumping the hash, sorting it in the unix shell and reading it in
> again. So I wrote a little functional capsule for that. here it is...
If you're using GNU awk you don't need that because it has built-in
functions to sort arrays by value and by index (hash).
> # print an array in sorted order by value
> # 20091022, Johannes Mainusch
> # don't blame me if it doesn't work on you
> # :-)
> #
> function print_sorted_by_value (prefix, array, scale, norm,
> significance, i, sum, tmpfile, cmd) {
> printf ("\nprinting in sorted order by value\n");
> for (i in array) n++; # get length of the array
>
> tmpfile=sprintf("del.me.%d",1000000*rand());
> #print "filename = ",tmpfile;
>
> sum = 0;
> for (key in array) {
> value = "nan";
> sum += array[key];
> if (array[key] > significance) value =
> 100*array[key]/norm;
> printf ("%s%-30s %8.1f %f\n", prefix, key,
> scale*array[key], value) >>tmpfile;
> }
> close (tmpfile);
> delete array;
>
> cmd = sprintf ("sort -n -r -k3 %s", tmpfile);
> while (cmd | getline myline) {
> print myline;
> # split (myline, tmp);
> # array[tmp[2]]=tmp[1];
> }
> close (tmpfile);
> system ("rm "tmpfile);
You should check that getline returns a positive value, and probably you
should also close(cmd) "just in case".
>
> printf ("%s%-30s %8.1f\n", prefix, "sum:", sum);
> printf ("-----------------------------------------\n");
> }
|
|
0
|
|
|
|
Reply
|
pk
|
10/23/2009 2:48:08 PM
|
|
On Fri, 23 Oct 2009 14:09:38 +0000 (UTC), gazelle@shell.xmission.com (Kenny McCormack) wrote:
>In article <c33c3055-7f83-4f24-acf8-f5a14dee6f29@o36g2000vbl.googlegroups.com>,
>johannes.mainusch <johannes.mainusch@gmx.de> wrote:
>>It took me some time to understand, why awks arrays cannot easily been
>>sorted. Anyway, somewhere I got the hint to use unix shells sort by
>>dumping the hash, sorting it in the unix shell and reading it in
>>again. So I wrote a little functional capsule for that. here it is...
>
>A couple of notes (objections):
>
>1) The need for this is pretty much obsolete today, given the built-in
> sorting capabilities of GAWK and TAWK (and if you're not using one
> or the other of these, then you really should be).
>
>2) IME, you rarely need to sort the *values*. My applications have
> always been the need for sorting the keys. TAWK does this
> automatically, of course, as does GAWK if you're a sufficiently
> whiny user (hint, hint).
>
>3) Incidentally, I've never used GAWK's asort() or asorti() functions.
> They look somewhat interesting, but I've never seen the need...
I use gawk's asort and asorti heaps:
grant@deltree:/usr/local/bin$ grep asort *|grep -v "Binary file"
cc2ip-logview: asort(tsdiff, tssort)
cc2ip-logview: asort(qslen, qssort)
cc2ip-logview: asort(rtime, rsort)
cc2ip-quota-lockout-view: n = asorti(query, sort)
get-web-blocks: numip = asorti(ip, ipnum_sort)
get-web-blocks: n = asorti(list_name, list_name_sorted)
ipblockmerge:# requires recent gawk with 'asorti' (tested with gawk-3.1.5)
ipblockmerge: n = asorti(list_input, list_sorted) # sort by start addr, blocksize
ipblockmerge: x = asorti(list_out, list_out_sorted)
junkview: pf = asort(xp)
junkview: j = asort(kk)
junkview: j = asort(kk)
junkview: sort_addr_port_len = asort(sort_addr_port)
junkview: sort_hits_port_len = asort(sort_hits_port)
junkview: addr_hits_port_len = asort(addr_hits_port)
junkview: hits_addr_port_len = asort(hits_addr_port)
junkview: hits_netw_addr_len = asort(hits_netw_addr)
junkview: m = asort(hl)
junkview: asort(nl)
junkview: sort_src_hit_dst_len = asort(sort_src_hit_dst)
logfilter: tcpsize = asorti(tcp, tcpsort) # sort by IP address
logfilter: nettcpsize = asorti(nettcp, nettcpsort) # sort by net address
pak-web-scan: n = asorti(list, sorted)
show-browsers: n = asorti(sort, sorted)
spam-net-finder: count = asort(output, sorted)
spam-net-finder-db: count = asort(output, sorted)
Grant.
--
http://bugsplatter.id.au
|
|
0
|
|
|
|
Reply
|
Grant
|
10/23/2009 5:32:25 PM
|
|
On Oct 23, 10:06=A0am, Ed Morton <mortons...@gmail.com> wrote:
> johannes.mainusch wrote:
> > It took me some time to understand, why awks arrays cannot easily been
> > sorted. Anyway, somewhere I got the hint to use unix shells sort by
> > dumping the hash, sorting it in the unix shell and reading it in
> > again. So I wrote a little functional capsule for that. here it is...
>
> > #
> > # print an array in sorted order by value
> > # 20091022, Johannes Mainusch
> > # don't blame me if it doesn't work on you
> > # :-)
> > #
> > function print_sorted_by_value (prefix, array, scale, norm,
> > significance, =A0 =A0 =A0i, sum, tmpfile, cmd) {
> > =A0 =A0printf ("\nprinting in sorted order by value\n");
> > =A0 =A0for (i in array) n++; # get length of the array
>
> > =A0 =A0tmpfile=3Dsprintf("del.me.%d",1000000*rand());
> > =A0 =A0#print "filename =3D ",tmpfile;
>
> > =A0 =A0sum =3D 0;
> > =A0 =A0for (key in array) {
> > =A0 =A0 =A0 =A0 =A0 =A0value =3D "nan";
> > =A0 =A0 =A0 =A0 =A0 =A0sum +=3D array[key];
> > =A0 =A0 =A0 =A0 =A0 =A0if (array[key] > significance) value =3D 100*arr=
ay[key]/norm;
> > =A0 =A0 =A0 =A0 =A0 =A0printf ("%s%-30s %8.1f %f\n", prefix, key, scale=
*array[key], value)
> >>> tmpfile;
> > =A0 =A0 =A0 =A0 =A0 =A0}
> > =A0 =A0close (tmpfile);
> > =A0 =A0delete array;
>
> > =A0 =A0cmd =3D sprintf ("sort -n -r -k3 %s", tmpfile);
> > =A0 =A0while (cmd | getline myline) =A0 {
> > =A0 =A0 =A0 =A0 =A0 =A0print myline;
> > =A0 =A0 =A0 =A0 =A0 =A0# split (myline, tmp);
> > =A0 =A0 =A0 =A0 =A0 =A0# array[tmp[2]]=3Dtmp[1];
> > =A0 =A0}
> > =A0 =A0close (tmpfile);
> > =A0 =A0system ("rm "tmpfile);
>
> > =A0 =A0printf ("%s%-30s %8.1f\n", prefix, "sum:", sum);
> > =A0 =A0printf ("-----------------------------------------\n");
> > }
>
> A couple of things pop out: you need to add "n" to the pseudo-argument
> list, the use of getline isn't a safe syntax (seehttp://awk.info/?tip/get=
line),
>
> =A0 =A0 =A0Ed.
Ah, both thanks and drat as well, for that link. I was using getline
in my first awk program, but I'm pretty sure that by following the
above link, that I can eliminate it. Good information.
|
|
0
|
|
|
|
Reply
|
Da_Gut
|
10/30/2009 6:06:42 PM
|
|
On Oct 30, 7:06=A0pm, Da_Gut <googlegro...@gutcup.com> wrote:
> On Oct 23, 10:06=A0am, Ed Morton <mortons...@gmail.com> wrote:
>
>
>
>
>
> > johannes.mainusch wrote:
> > > It took me some time to understand, why awks arrays cannot easily bee=
n
> > > sorted. Anyway, somewhere I got the hint to use unix shells sort by
> > > dumping the hash, sorting it in the unix shell and reading it in
> > > again. So I wrote a little functional capsule for that. here it is...
>
> > > #
> > > # print an array in sorted order by value
> > > # 20091022, Johannes Mainusch
> > > # don't blame me if it doesn't work on you
> > > # :-)
> > > #
> > > function print_sorted_by_value (prefix, array, scale, norm,
> > > significance, =A0 =A0 =A0i, sum, tmpfile, cmd) {
> > > =A0 =A0printf ("\nprinting in sorted order by value\n");
> > > =A0 =A0for (i in array) n++; # get length of the array
>
> > > =A0 =A0tmpfile=3Dsprintf("del.me.%d",1000000*rand());
> > > =A0 =A0#print "filename =3D ",tmpfile;
>
> > > =A0 =A0sum =3D 0;
> > > =A0 =A0for (key in array) {
> > > =A0 =A0 =A0 =A0 =A0 =A0value =3D "nan";
> > > =A0 =A0 =A0 =A0 =A0 =A0sum +=3D array[key];
> > > =A0 =A0 =A0 =A0 =A0 =A0if (array[key] > significance) value =3D 100*a=
rray[key]/norm;
> > > =A0 =A0 =A0 =A0 =A0 =A0printf ("%s%-30s %8.1f %f\n", prefix, key, sca=
le*array[key], value)
> > >>> tmpfile;
> > > =A0 =A0 =A0 =A0 =A0 =A0}
> > > =A0 =A0close (tmpfile);
> > > =A0 =A0delete array;
>
> > > =A0 =A0cmd =3D sprintf ("sort -n -r -k3 %s", tmpfile);
> > > =A0 =A0while (cmd | getline myline) =A0 {
> > > =A0 =A0 =A0 =A0 =A0 =A0print myline;
> > > =A0 =A0 =A0 =A0 =A0 =A0# split (myline, tmp);
> > > =A0 =A0 =A0 =A0 =A0 =A0# array[tmp[2]]=3Dtmp[1];
> > > =A0 =A0}
> > > =A0 =A0close (tmpfile);
> > > =A0 =A0system ("rm "tmpfile);
>
> > > =A0 =A0printf ("%s%-30s %8.1f\n", prefix, "sum:", sum);
> > > =A0 =A0printf ("-----------------------------------------\n");
> > > }
>
> > A couple of things pop out: you need to add "n" to the pseudo-argument
> > list, the use of getline isn't a safe syntax (seehttp://awk.info/?tip/g=
etline),
>
> > =A0 =A0 =A0Ed.
>
> Ah, both thanks and drat as well, for that link. I was using getline
> in my first awk program, but I'm pretty sure that by following the
> above link, that I can =A0eliminate it. Good information.
Thanks for all the good discussion. I'll try to digest that link and
understand getline better and then I'll clean up my code (I have done
that in fact). The reason for me not to use gawk is simply that I
develop on Mac and that I deploy on Debian. And I am just a part/part
time developer. That is in fact a hobby besides line management. So
*awk is not really an option. And the on remark about the possibility
of sorting hashes in awk I did not understand and I do not believe
it's possible as sorting always involves swapping elements and that
involves any kind if reference to elements which I do not have in an
awk hash. anyway, I might be mistaken and please do prove me wrong by
code sample :-)
Btw. I use awk to analyze custom webserver logs and histogramm data
and get sorted cross references... Its fast and nasty, and yes I know
about the existence of perl, ruby or open source log analyzers. But as
someone recently put it: "awk is a nice chainsaw..."
Cheers
Johannes
|
|
0
|
|
|
|
Reply
|
johannes
|
11/5/2009 9:35:03 PM
|
|
|
7 Replies
272 Views
(page loaded in 0.111 seconds)
Similiar Articles: How to increment vector/array?? - comp.soft-sys.matlabsorting of awk arrays (hashes) function - comp.lang.awk sorting array indices on multiple fields - comp.lang.awk ..... i += 1) { print b[c[i]]; } } function ... Sort cell array - comp.soft-sys.matlabHow to sort array - comp.lang.awk Sort cell array ... 2007 (Updated 05 Mar 2008) This function will sort a cell array ... itags.org: Matlab question: Sorting a cell array ... alphabetical sort of cell array - comp.soft-sys.matlab... to sort a 2 column cell array ('test') using the test2 = sortrows(test,2) function ... do you intend the result of sorting ... How to sort array - comp.lang.awk alphabetical ... dir sort by date - comp.soft-sys.matlabSo when I use dir function, I get them in some ... Using awk to find a range of dates - comp ... Sort cell array - comp.soft-sys.matlab Function or Code Similar to vec2mat for ... bucket sort - comp.soft-sys.matlab... sort alg for K value in an array i wrote the code but the sort() function of ... sorting array indices on multiple fields - comp.lang.awk ... bucket sort - comp.soft-sys ... How to program a quick sort function? - comp.lang.cHow to sort array - comp.lang.awk... an Array | eHow.com ... How to Use Quick Sort Function in C++ in the Array of Integers ... "Quick sort" is a sorting algorithm that ... Using awk to find a range of dates - comp.lang.awk... monbyno, / /) for (i in monbyno) monbynam[monbyno[i]] = i } function mdytono ... you find that mktime(current date ... split(,,sep) splits a string into an awk array ... exec in AWK - comp.lang.awk-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ... This works perfectly, but system() function is very slow in awk ... bash variable by semicolon - comp.lang.awk... s\n",i,array ... Expression templates and array indexing - comp.lang.c++.moderated ...In the assignment operator : template <typename T2, typename Rep2> Array ... That's very surprising since it's avoiding the loop and array indexing. ... awk ... Use of uninitialized value in open - comp.lang.perl.misc ...... Use of uninitialized value in hash element at ... array variables in functions - comp.lang.awk... and the extra parameters that are > used in the function body as arrays ... The GNU Awk User's Guide - delorie softwareIn most awk implementations, sorting an array requires writing a sort function. While this can be educational for exploring different sorting algorithms, usually that's not ... UNIX BASH scripting: Array sorting with Awk asort functionasort is a gawk-specific extension which sorts an array. More of asort can be found here and here Lets see some array sorting example using awk asort function. 7/13/2012 12:27:35 AM
|