Awk arrays and specific character matching

  • Follow


am relativity new to awk but what I'm trying to do is take a list of
serial numbers and parse out the duplicates, but the problem is I only
want to remove duplicate entrys if they are at certain characters of
the serial number.
(or i could just use uniq)

It's a 17 digit serial number but i only want to remove duplicates if
they appear in digit 1,2,3,4,5,6,7,8, 10,11,12 disregarding any
duplication in digit 9 or 13-17

is this posible with a awk array?

or do i need to use somthing else to acomplish this?



thanks
0
Reply gregdodds 12/3/2004 7:31:56 AM

# Compare 17-digit serial numbers based on digits 1--8 and
# 10--12.  Discard duplicates.
{ fixed = fix($0)
  if ( !(fixed in a) )
  { print
    a[fixed]++
  }
}

function fix(s)
{ return substr(s,1,8) substr(s,10,3)
}


Does this do what you want?
0
Reply w_a_x_man 12/3/2004 10:52:52 AM



GREG_D wrote:
> am relativity new to awk but what I'm trying to do is take a list of
> serial numbers and parse out the duplicates, but the problem is I only
> want to remove duplicate entrys if they are at certain characters of
> the serial number.
> (or i could just use uniq)
> 
> It's a 17 digit serial number but i only want to remove duplicates if
> they appear in digit 1,2,3,4,5,6,7,8, 10,11,12 disregarding any
> duplication in digit 9 or 13-17
> 
> is this posible with a awk array?
> 
> or do i need to use somthing else to acomplish this?

Something like this should do it if you want to keep the last occurrence 
of the serial number:

awk 'BEGIN{FS=""}
      {a[$1$2$3$4$5$6$7$8$10$11$12]=$0}
      END{for (i in a) print a[i]}'

If you want to keep the first occurrence then it's:

awk 'BEGIN{FS=""}
      {i=$1$2$3$4$5$6$7$8$10$11$12}
      !(i in a){a[i]=$0}
      END{for (i in a) print a[i]}'

Regards,

	Ed.
0
Reply Ed 12/3/2004 12:57:39 PM

gregdodds@canada.com (GREG_D) wrote in message news:<f530cee6.0412022331.341444a3@posting.google.com>...
> am relativity new to awk but what I'm trying to do is take a list of
> serial numbers and parse out the duplicates, but the problem is I only
> want to remove duplicate entrys if they are at certain characters of
> the serial number.
> (or i could just use uniq)
> 
> It's a 17 digit serial number but i only want to remove duplicates if
> they appear in digit 1,2,3,4,5,6,7,8, 10,11,12 disregarding any
> duplication in digit 9 or 13-17
> 
> is this posible with a awk array?
> 
> or do i need to use somthing else to acomplish this?
> 
> 
> 
> thanks

hey thanks to both of you for all your help

that worked great, just what i needed
0
Reply gregdodds 12/3/2004 11:15:16 PM

3 Replies
252 Views

(page loaded in 0.049 seconds)

Similiar Articles:













7/23/2012 9:37:35 AM


Reply: