f



Capture two char Country codes other than CN and KR with match function.

Hi all,

I use gawk built-in match function to capture two char Country codes 
other than CN and KR.  Currently, I use the following code:


 awk 'match($0, /...([A-Z]{2}).../, a ) {
 if ( a[1] != "CN" &&  a[1] != "KR" ) {
  do_something
}

I try to find the direct method without using the ``if'' to do this job.

Is there some other more concise regexp for this?

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/1/2016 6:44:26 AM
comp.lang.awk 3450 articles. 0 followers. Post Follow

10 Replies
248 Views

Similar Articles

[PageSpeed] 39

On 12/1/2016 12:44 AM, Hongyi Zhao wrote:
> Hi all,
>
> I use gawk built-in match function to capture two char Country codes
> other than CN and KR.  Currently, I use the following code:
>
>
>  awk 'match($0, /...([A-Z]{2}).../, a ) {
>  if ( a[1] != "CN" &&  a[1] != "KR" ) {
>   do_something
> }
>
> I try to find the direct method without using the ``if'' to do this job.
>
> Is there some other more concise regexp for this?

[A-BD-JL-Z][A-Z]|C[A-MO-Z]|K[A-QS-Z]

Read a book and THINK, there's really no substitute. Also, unless each line of 
input is exactly 8 characters your regexp will produce false matches since it's 
not using word boundaries or anchors.

	Ed.
0
Ed
12/1/2016 3:35:51 PM
On 2016-12-01, Ed Morton <mortonspam@gmail.com> wrote:
> Read a book and THINK, there's really no substitute.

Well, yes there kind of is, namely: read a *newsgroup* and THINK.

Based on how well that is working, I don't estimate great results for
read-a-book-and-think.
0
Kaz
12/1/2016 5:18:04 PM
Hongyi Zhao wrote:

>Hi all,
>
>I use gawk built-in match function to capture two char Country codes 
>other than CN and KR.  Currently, I use the following code:
>
>
> awk 'match($0, /...([A-Z]{2}).../, a ) {
> if ( a[1] != "CN" &&  a[1] != "KR" ) {
>  do_something
>}
>
>I try to find the direct method without using the ``if'' to do this job.


what about

> awk 'match($0, /...([A-Z]{2}).../, a ) &&  a[1] != "CN" &&  a[1] != "KR" ) {
>   do_something with a[1]
> }

or

> awk 'match($0, /...([A-Z]{2}).../, a ) &&  !/...(CN|KR).../ {
>   do_something with a[1]
> }
-- 

Lorenz
0
Lorenz
12/2/2016 8:27:52 AM
On Fri, 02 Dec 2016 08:27:52 +0000, Lorenz wrote:

> what about
> 
>> awk 'match($0, /...([A-Z]{2}).../, a ) &&  a[1] != "CN" &&  a[1] !=
>> "KR" ) {
>>   do_something with a[1]
>> }
> 
> or
> 
>> awk 'match($0, /...([A-Z]{2}).../, a ) &&  !/...(CN|KR).../ {
>>   do_something with a[1]
>> }
> --
> 
> Lorenz

Thanks.



-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/2/2016 11:34:52 AM
"Hongyi Zhao" <hongyi.zhao@gmail.com> wrote in message 
news:o1ogs9$k25$1@aspen.stu.neva.ru...
> Hi all,
>
> I use gawk built-in match function to capture two char Country codes
> other than CN and KR.  Currently, I use the following code:
>
>
> awk 'match($0, /...([A-Z]{2}).../, a ) {
> if ( a[1] != "CN" &&  a[1] != "KR" ) {
>  do_something
> }
>
> I try to find the direct method without using the ``if'' to do this job.
>
> Is there some other more concise regexp for this?

if ( match($0, /...([^CK].|C[^N]|K[^R]).../, a) ) {
   do_something
}

which will produce false matches if those two middle characters are not 
always both uppercase alphabetic.

- Anton Treuenfels

0
Anton
12/11/2016 9:14:26 PM
On Sun, 11 Dec 2016 15:14:26 -0600, Anton Treuenfels wrote:

> if ( match($0, /...([^CK].|C[^N]|K[^R]).../, a) ) {
>    do_something
> }
> 
> which will produce false matches if those two middle characters are not
> always both uppercase alphabetic.

Thanks a lot.

> 
> - Anton Treuenfels





-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/13/2016 1:22:44 AM
On Sun, 11 Dec 2016 15:14:26 -0600, Anton Treuenfels wrote:

> if ( match($0, /...([^CK].|C[^N]|K[^R]).../, a) ) {
>    do_something
> }
> 
> which will produce false matches if those two middle characters are not
> always both uppercase alphabetic.

Thanks, after a second thought, I've two additional notes based on your 
solution:

[1] `if' is not need here according to the match function's description 
in the manual:

  Return the position  in  s  where  the  regular
  expression  r occurs, or 0 if r is not present...


[2] In order to workaround the failure case, I revise your code into the 
following form:

match($0, /...([^CK][A-Z]|C[^N]|K[^R]).../, a) {
    do_something
 }

Regards

> 
> - Anton Treuenfels





-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/13/2016 2:01:43 PM
On Tue, 13 Dec 2016 14:01:43 +0000, Hongyi Zhao wrote:

> match($0, /...([^CK][A-Z]|C[^N]|K[^R]).../, a) {
>     do_something
>  }

Should be:

match($0, /...([ABD-JL-Z][A-Z]|C[^N]|K[^R]).../, a) {
    do_something
 }

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/13/2016 2:05:29 PM
On Thu, 01 Dec 2016 09:35:51 -0600, Ed Morton wrote:

> [A-BD-JL-Z][A-Z]|C[A-MO-Z]|K[A-QS-Z]

I think there is no need for `-' between A-B:

[ABD-JL-Z][A-Z]|C[A-MO-Z]|K[A-QS-Z]

Thanks again.
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/13/2016 2:13:17 PM
"Hongyi Zhao" <hongyi.zhao@gmail.com> wrote in message 
news:o2ov78$nsg$2@aspen.stu.neva.ru...
> On Tue, 13 Dec 2016 14:01:43 +0000, Hongyi Zhao wrote:
>
> match($0, /...([ABD-JL-Z][A-Z]|C[^N]|K[^R]).../, a) {
>    do_something
> }
>

This still doesn't quite get around the problem of non-alphabetic characters 
completely. For instance, the character following 'C' or 'K' could be 
numeric or lower case. Maybe a second pattern would help:

/...[A-Z]{2}.../ && match( $0, /...([^CK].|C[^N]|K[^R]).../, a ) {
    do_something
}

Of course if the only reason the array 'a' exists is to check for 'CN' or 
'KR', it can be discarded as no longer necessary:

/...[A-Z]{2}.../ && /...([^CK].|C[^N]|K[^R]).../ {
  do_something
}

- Anton Treuenfels

0
Anton
12/14/2016 4:05:26 PM
Reply: