wildcard matching algorithm

  • Follow


Does anybody have an efficient, conventional rexx implementation of a
wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
well as "*" or "%"
0
Reply regli 9/8/2009 5:34:30 PM

I did this to start the discussion.  Likely to still have some bugs.

request = "?Y?abc??d%ef%mb"
request = translate(request)
str = 'XXXABCXXDEFGHIJKLMB'
str = translate(str)

TRUE = 1
FALSE = 0
exactMatch = ""
wildCard = FALSE
match = TRUE

len = length(request) + 1
posStr = 1

do i = 1 to len
   if i == len then
      maskChar = ""
   else
      maskChar = substr(request,i,1)
   select
     when maskChar == WILDCARDCHAR | maskChar == WILDPOSCHAR ,
          | maskChar == "" then do
        if wildCard then do
           wildCard = FALSE
           pos = pos(exactMatch,str,posStr)
           if pos = 0 then do
              match = FALSE
              leave
           end
           else do
              posStr = pos + length(exactMatch)
              exactMatch = ""
           end
        end
        else do
           if exactMatch <> "" & exactMatch == ,
                 substr(str,posStr,length(exactMatch)) then do
              posStr = posStr + length(exactMatch)
              exactMatch = ""
           end
        end
        if maskChar == WILDCARDCHAR then
           wildCard = TRUE
        else do
           if substr(str,posStr,1) == " " & maskChar <> "" then do
              match = FALSE
              leave
           end
           else
              posStr = posStr + 1
        end
     end
   otherwise
      exactMatch = exactMatch||maskChar
   end
end
if match then
   say '*** MATCHED - mask:"'request'" matched string:"'str'"'
else
   say "*** NO match ***"
exit
0
Reply regli 9/9/2009 12:14:45 AM


regli <raeegli@gmail.com> wrote:

> Does anybody have an efficient, conventional rexx implementation of a
> wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> well as "*" or "%"

What OS/platform?

And what do you want search - lines in a file, or values in a stem array, or
what? 

-- 
Jeremy C B Nicoll - my opinions are my own.

Email sent to my from-address will be deleted. Instead, please reply
to newsreplynnn@wingsandbeaks.org.uk replacing "nnn" by "284".  
0
Reply Jeremy 9/11/2009 2:46:32 AM

On Sep 10, 7:46=A0pm, Jeremy Nicoll - news posts
<jn.nntp.scrap...@wingsandbeaks.org.uk> wrote:
> regli <raee...@gmail.com> wrote:
> > Does anybody have an efficient, conventional rexx implementation of a
> > wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> > well as "*" or "%"
>
> What OS/platform?
>
> And what do you want search - lines in a file, or values in a stem array,=
 or
> what?
>
> --
> Jeremy C B Nicoll - my opinions are my own.
>
> Email sent to my from-address will be deleted. Instead, please reply
> to newsreply...@wingsandbeaks.org.uk replacing "nnn" by "284". =A0

It has to be platform independent, i.e. a pure Rexx implementation.  I
made some changes to the above and it works much better now.

However, it would be nice to see other implementations.

Here is my latest code:



 WildCardMatch: PROCEDURE

   WILDCARDCHAR =3D "%"
   WILDPOSCHAR =3D "?"
   pattern =3D arg(1)
   pattern =3D translate(pattern)
   str =3D arg(2)
   str =3D translate(str)

   TRUE =3D 1
   FALSE =3D 0
   exactMatch =3D ""
   wildCard =3D FALSE
   match =3D TRUE

   len =3D length(pattern) + 1
   posStr =3D 1

   do i =3D 1 to len
      maskChar =3D substr(pattern,i,1)
      select
        when maskChar =3D=3D WILDCARDCHAR | maskChar =3D=3D WILDPOSCHAR ,
             | i =3D=3D len then do
           if wildCard then do
              if len =3D i then         /* end of loop with open wild
card */
                 leave
              wildCard =3D FALSE
              if exactMatch <> "" then do
                 pos =3D pos(exactMatch,str,posStr)
                 if pos =3D 0 then do
                    match =3D FALSE
                    leave
                 end
                 else do
                    posStr =3D pos + length(exactMatch)
                    exactMatch =3D ""
                 end
              end
           end
           else do
              if exactMatch <> "" then do
                 if exactMatch =3D=3D ,
                      substr(str,posStr,length(exactMatch)) then do
                    posStr =3D posStr + length(exactMatch)
                    exactMatch =3D ""
                 end
                 else do
                    match =3D FALSE
                    leave
                 end
              end
           end
           if maskChar =3D=3D WILDCARDCHAR then
              wildCard =3D TRUE
           else do
              if len =3D i then do
                 if substr(str,posStr) <> "" then do
                    match =3D FALSE
                    leave
                 end
                 else
                    leave
              end
              if substr(str,posStr,1) =3D=3D " " then do
                 match =3D FALSE
                 leave
              end
              else
                 posStr =3D posStr + 1
           end
        end
      otherwise
         exactMatch =3D exactMatch||maskChar
      end
   end
   return match
0
Reply regli 9/11/2009 4:21:02 AM

On 8 Sep, 18:34, regli <raee...@gmail.com> wrote:
> Does anybody have an efficient, conventional rexx implementation of a
> wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> well as "*" or "%"

I used to have a really concise one, but this is the one I use now:
/* REXX
*/
/*-Start of PATMATCH function---------------------------Version-01.01-
*/
/*:PATMATCH function - Performs ISPF-type pattern matching on a pair
*/
/* of strings: 1st string is the pattern, 2nd is the data.
*/
/* Example: rc=PATMATCH('*EFG*','ABCDEFGH')
*/
/* Returns: 0:    Pattern does not match the data.
*/
/*          1:    Pattern does match the data.
*/
/*          text: Error occured, text gives the details.
*/
/* Copyright (C) 1996, 1998 Washington Systems. All rights reserved.
*/
/*--------------------------------------------------------------------
*/
PATMATCH: PROCEDURE EXPOSE
gbl.
TRACE
N
IF ARG() <> 2 THEN RETURN 'PATMATCH ERROR: 2 parms required,
caller',
                          'passed 'ARG()'
parms'
haystack = ARG(2)                  /* Data to be searched w/ pattern
*/
pattern = ARG(1)                   /* Search pattern
*/
wild = 0                           /* No '*' seen (yet)
*/
  DO WHILE pattern <> ''           /* Process pattern from L to R
*/

  a = POS('*',pattern)             /* Look for '*' in pattern
*/
  IF a > 0 THEN                    /* If found, extract any
*/
    DO                             /*  preceeding search string
*/
    PARSE VAR pattern needle '*' +0
newpattern
    IF needle = ''
THEN
      DO                           /* If no preceeding search string,
*/
      wild = 1                     /*  mark next search as 'wild
card'*/
      PARSE VAR newpattern 2 pattern /* remove '*' from pattern
*/
      ITERATE                      /*    and continue testing
*/
 
END
 
END

  p = POS('%',pattern)             /* Look for '%' in pattern
*/
  IF p > 0 & (a = 0 | p < a) THEN  /* If found before any '*',
extract*/
    DO                             /*  any preceeding search string
*/
    PARSE VAR pattern needle '%' +0
newpattern
    IF needle = ''
THEN
      DO                           /* If no preceeding search string,
*/
      needle = LEFT(haystack,1)    /* take next text as search string
*/
      PARSE VAR newpattern 2
newpattern
      wild = 0                     /* Mark next search 'not wild
card'*/
 
END
 
END

  IF p = 0 & a = 0                 /* No special chars, use remainder
*/
  THEN PARSE VAR pattern needle
newpattern

  pattern = newpattern             /* Update pattern for next pass
*/
  pos = POS(needle,haystack)       /* Look for this pattern
*/
  IF pos = 0 THEN RETURN 0         /* Not found, outta here
*/
  IF pos > 1 THEN                  /* Found, but not at start-of-text
*/
    IF wild <> 1                   /* Wild card char in effect?
*/
    THEN RETURN 0                  /* No wild card, outta here
*/
  wild = 0                         /* Reset wild card character
*/
  len = LENGTH(haystack)-LENGTH(needle)-pos
+1
  haystack = RIGHT(haystack,len)   /* Remove data that we just
scanned*/
  IF haystack = ''
THEN
    DO                             /* No more data to scan...
*/
    IF pattern = ''                /* Out of pattern as well?
*/
    THEN RETURN 1                  /* Yes, that's a match.
*/
    IF pattern = '*'               /* Did pattern end with wild card?
*/
    THEN RETURN 1                  /* Yes, that's a match too.
*/
    RETURN 0                       /* Else, sorry, not a match.
*/
 
END
  END                              /* Ran out of pattern to scan
*/
IF wild = 1                        /* Did pattern end with wild card?
*/
THEN RETURN 1                      /* Yes, that's a match.
*/
ELSE RETURN 0                      /* Else, sorry, not a match.
*/
/*-End of PATMATCH function-------------------------------------------
*/
0
Reply Captain 9/14/2009 2:46:57 PM

Thanks a lot for your post!  I ran a test with both options and it
appears that parse makes it somewhat slower though only by a slight
margin.

19:07:19
elapsed:11.872000 - matched:300000
19:07:31
elapsed:8.199000 - matched:300000
19:07:39

I ran with 6 pattern matches for each each cycle and executed that
cycle 100,000 times which resulted in 300,000 matches. The first
execution is using your code and the second uses mine.

Again, thanks a lot!
0
Reply regli 9/16/2009 2:19:40 AM

A perfect example of 'platform independent' not being the 
same as 'platform efficient'!
Run the same test on a mainframe, in particular VM/CMS, and 
measure cpu cycles and Parse will probably win by a 
significant amount. That's because the original VM Rexx was 
implemented with the mainframe instruction set and Parse 
was optimized for efficiency, according to Mike.

Les               (Change Arabic to Roman to email me)



regli wrote:
> Thanks a lot for your post!  I ran a test with both options and it
> appears that parse makes it somewhat slower though only by a slight
> margin.
> 
> 19:07:19
> elapsed:11.872000 - matched:300000
> 19:07:31
> elapsed:8.199000 - matched:300000
> 19:07:39
> 
> I ran with 6 pattern matches for each each cycle and executed that
> cycle 100,000 times which resulted in 300,000 matches. The first
> execution is using your code and the second uses mine.
> 
> Again, thanks a lot!
0
Reply LesK 9/16/2009 10:31:24 PM

On Sep 16, 3:31=A0pm, LesK <5mr...@tampabay.rr.com> wrote:
> A perfect example of 'platform independent' not being the
> same as 'platform efficient'!
> Run the same test on a mainframe, in particular VM/CMS, and
> measure cpu cycles and Parse will probably win by a
> significant amount. That's because the original VM Rexx was
> implemented with the mainframe instruction set and Parse
> was optimized for efficiency, according to Mike.
>
> Les =A0 =A0 =A0 =A0 =A0 =A0 =A0 (Change Arabic to Roman to email me)
>
> regli wrote:
> > Thanks a lot for your post! =A0I ran a test with both options and it
> > appears that parse makes it somewhat slower though only by a slight
> > margin.
>
> > 19:07:19
> > elapsed:11.872000 - matched:300000
> > 19:07:31
> > elapsed:8.199000 - matched:300000
> > 19:07:39
>
> > I ran with 6 pattern matches for each each cycle and executed that
> > cycle 100,000 times which resulted in 300,000 matches. The first
> > execution is using your code and the second uses mine.
>
> > Again, thanks a lot!
>
>

You are right but I'd still like to see somebody running it on VM.
I'll ask for that favor in another forum and report back.
0
Reply regli 9/17/2009 3:59:20 AM

As promised here are the results from z/OS 1.9:

Solution 1:
TOTAL TCB CPU TIME= .89 TOTAL ELAPSED TIME= 2.0

09:39:20
elapsed:125.018178 - matched:30000
09:41:25

Solution 2:
TOTAL TCB CPU TIME=   1.16 TOTAL ELAPSED TIME=   1.8

09:42:05
elapsed:110.321300 - matched:30000
09:43:55

It shows that the parse solution gained a tremendous amount and won
the contest on the mainframe as I believe that in that scenario CPU
time is more telling that elapse time.
0
Reply regli 9/18/2009 5:09:51 PM

On 16 Sep, 23:31, LesK <5mr...@tampabay.rr.com> wrote:
> A perfect example of 'platform independent' not being the
> same as 'platform efficient'!
> Run the same test on a mainframe, in particular VM/CMS, and
> measure cpu cycles and Parse will probably win by a
> significant amount. That's because the original VM Rexx was
> implemented with the mainframe instruction set and Parse
> was optimized for efficiency, according to Mike.
>
> Les =A0 =A0 =A0 =A0 =A0 =A0 =A0 (Change Arabic to Roman to email me)

Ahh but if it was on VM I would be using the PIPATTERN extension for
CMS/TSO PIPELINES to match the string.
0
Reply Captain 9/18/2009 8:58:07 PM

Captain Paralytic wrote:
> Ahh but if it was on VM I would be using the PIPATTERN extension for
> CMS/TSO PIPELINES to match the string.

What?  Not one of the xxGREP stages?

�R
0
Reply Glenn 9/18/2009 10:16:24 PM

But then you couldn't compare apples to apples! The 
original requirement was for platform independence.

Les               (Change Arabic to Roman to email me)



Captain Paralytic wrote:
> On 16 Sep, 23:31, LesK <5mr...@tampabay.rr.com> wrote:
>> A perfect example of 'platform independent' not being the
>> same as 'platform efficient'!
>> Run the same test on a mainframe, in particular VM/CMS, and
>> measure cpu cycles and Parse will probably win by a
>> significant amount. That's because the original VM Rexx was
>> implemented with the mainframe instruction set and Parse
>> was optimized for efficiency, according to Mike.
>>
>> Les               (Change Arabic to Roman to email me)
> 
> Ahh but if it was on VM I would be using the PIPATTERN extension for
> CMS/TSO PIPELINES to match the string.
0
Reply LesK 9/19/2009 6:02:43 AM

It would be interesting to see the two Pipe solutions and 
measurements :-)

Les               (Change Arabic to Roman to email me)



Glenn Knickerbocker wrote:
> Captain Paralytic wrote:
>> Ahh but if it was on VM I would be using the PIPATTERN extension for
>> CMS/TSO PIPELINES to match the string.
> 
> What?  Not one of the xxGREP stages?
> 
> �R
0
Reply LesK 9/19/2009 6:05:49 AM

On 18 Sep, 23:16, Glenn Knickerbocker <N...@bestweb.net> wrote:
> Captain Paralytic wrote:
> > Ahh but if it was on VM I would be using the PIPATTERN extension for
> > CMS/TSO PIPELINES to match the string.
>
> What? =A0Not one of the xxGREP stages?
>
> =ACR

I don't have access to the latest VM PIPLEINEs anymore :-(

Nearest to grep that was available when I had the access was PIPATTERN.
0
Reply Captain 9/19/2009 12:33:21 PM

13 Replies
357 Views

(page loaded in 0.195 seconds)

Similiar Articles:













7/23/2012 6:12:09 PM


Reply: