Does anybody have an efficient, conventional rexx implementation of a
wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
well as "*" or "%"
|
|
0
|
|
|
|
Reply
|
regli
|
9/8/2009 5:34:30 PM |
|
I did this to start the discussion. Likely to still have some bugs.
request = "?Y?abc??d%ef%mb"
request = translate(request)
str = 'XXXABCXXDEFGHIJKLMB'
str = translate(str)
TRUE = 1
FALSE = 0
exactMatch = ""
wildCard = FALSE
match = TRUE
len = length(request) + 1
posStr = 1
do i = 1 to len
if i == len then
maskChar = ""
else
maskChar = substr(request,i,1)
select
when maskChar == WILDCARDCHAR | maskChar == WILDPOSCHAR ,
| maskChar == "" then do
if wildCard then do
wildCard = FALSE
pos = pos(exactMatch,str,posStr)
if pos = 0 then do
match = FALSE
leave
end
else do
posStr = pos + length(exactMatch)
exactMatch = ""
end
end
else do
if exactMatch <> "" & exactMatch == ,
substr(str,posStr,length(exactMatch)) then do
posStr = posStr + length(exactMatch)
exactMatch = ""
end
end
if maskChar == WILDCARDCHAR then
wildCard = TRUE
else do
if substr(str,posStr,1) == " " & maskChar <> "" then do
match = FALSE
leave
end
else
posStr = posStr + 1
end
end
otherwise
exactMatch = exactMatch||maskChar
end
end
if match then
say '*** MATCHED - mask:"'request'" matched string:"'str'"'
else
say "*** NO match ***"
exit
|
|
0
|
|
|
|
Reply
|
regli
|
9/9/2009 12:14:45 AM
|
|
regli <raeegli@gmail.com> wrote:
> Does anybody have an efficient, conventional rexx implementation of a
> wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> well as "*" or "%"
What OS/platform?
And what do you want search - lines in a file, or values in a stem array, or
what?
--
Jeremy C B Nicoll - my opinions are my own.
Email sent to my from-address will be deleted. Instead, please reply
to newsreplynnn@wingsandbeaks.org.uk replacing "nnn" by "284".
|
|
0
|
|
|
|
Reply
|
Jeremy
|
9/11/2009 2:46:32 AM
|
|
On Sep 10, 7:46=A0pm, Jeremy Nicoll - news posts
<jn.nntp.scrap...@wingsandbeaks.org.uk> wrote:
> regli <raee...@gmail.com> wrote:
> > Does anybody have an efficient, conventional rexx implementation of a
> > wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> > well as "*" or "%"
>
> What OS/platform?
>
> And what do you want search - lines in a file, or values in a stem array,=
or
> what?
>
> --
> Jeremy C B Nicoll - my opinions are my own.
>
> Email sent to my from-address will be deleted. Instead, please reply
> to newsreply...@wingsandbeaks.org.uk replacing "nnn" by "284". =A0
It has to be platform independent, i.e. a pure Rexx implementation. I
made some changes to the above and it works much better now.
However, it would be nice to see other implementations.
Here is my latest code:
WildCardMatch: PROCEDURE
WILDCARDCHAR =3D "%"
WILDPOSCHAR =3D "?"
pattern =3D arg(1)
pattern =3D translate(pattern)
str =3D arg(2)
str =3D translate(str)
TRUE =3D 1
FALSE =3D 0
exactMatch =3D ""
wildCard =3D FALSE
match =3D TRUE
len =3D length(pattern) + 1
posStr =3D 1
do i =3D 1 to len
maskChar =3D substr(pattern,i,1)
select
when maskChar =3D=3D WILDCARDCHAR | maskChar =3D=3D WILDPOSCHAR ,
| i =3D=3D len then do
if wildCard then do
if len =3D i then /* end of loop with open wild
card */
leave
wildCard =3D FALSE
if exactMatch <> "" then do
pos =3D pos(exactMatch,str,posStr)
if pos =3D 0 then do
match =3D FALSE
leave
end
else do
posStr =3D pos + length(exactMatch)
exactMatch =3D ""
end
end
end
else do
if exactMatch <> "" then do
if exactMatch =3D=3D ,
substr(str,posStr,length(exactMatch)) then do
posStr =3D posStr + length(exactMatch)
exactMatch =3D ""
end
else do
match =3D FALSE
leave
end
end
end
if maskChar =3D=3D WILDCARDCHAR then
wildCard =3D TRUE
else do
if len =3D i then do
if substr(str,posStr) <> "" then do
match =3D FALSE
leave
end
else
leave
end
if substr(str,posStr,1) =3D=3D " " then do
match =3D FALSE
leave
end
else
posStr =3D posStr + 1
end
end
otherwise
exactMatch =3D exactMatch||maskChar
end
end
return match
|
|
0
|
|
|
|
Reply
|
regli
|
9/11/2009 4:21:02 AM
|
|
On 8 Sep, 18:34, regli <raee...@gmail.com> wrote:
> Does anybody have an efficient, conventional rexx implementation of a
> wildcard matching algorithm? Hopefully with positionals, i.e. "?" as
> well as "*" or "%"
I used to have a really concise one, but this is the one I use now:
/* REXX
*/
/*-Start of PATMATCH function---------------------------Version-01.01-
*/
/*:PATMATCH function - Performs ISPF-type pattern matching on a pair
*/
/* of strings: 1st string is the pattern, 2nd is the data.
*/
/* Example: rc=PATMATCH('*EFG*','ABCDEFGH')
*/
/* Returns: 0: Pattern does not match the data.
*/
/* 1: Pattern does match the data.
*/
/* text: Error occured, text gives the details.
*/
/* Copyright (C) 1996, 1998 Washington Systems. All rights reserved.
*/
/*--------------------------------------------------------------------
*/
PATMATCH: PROCEDURE EXPOSE
gbl.
TRACE
N
IF ARG() <> 2 THEN RETURN 'PATMATCH ERROR: 2 parms required,
caller',
'passed 'ARG()'
parms'
haystack = ARG(2) /* Data to be searched w/ pattern
*/
pattern = ARG(1) /* Search pattern
*/
wild = 0 /* No '*' seen (yet)
*/
DO WHILE pattern <> '' /* Process pattern from L to R
*/
a = POS('*',pattern) /* Look for '*' in pattern
*/
IF a > 0 THEN /* If found, extract any
*/
DO /* preceeding search string
*/
PARSE VAR pattern needle '*' +0
newpattern
IF needle = ''
THEN
DO /* If no preceeding search string,
*/
wild = 1 /* mark next search as 'wild
card'*/
PARSE VAR newpattern 2 pattern /* remove '*' from pattern
*/
ITERATE /* and continue testing
*/
END
END
p = POS('%',pattern) /* Look for '%' in pattern
*/
IF p > 0 & (a = 0 | p < a) THEN /* If found before any '*',
extract*/
DO /* any preceeding search string
*/
PARSE VAR pattern needle '%' +0
newpattern
IF needle = ''
THEN
DO /* If no preceeding search string,
*/
needle = LEFT(haystack,1) /* take next text as search string
*/
PARSE VAR newpattern 2
newpattern
wild = 0 /* Mark next search 'not wild
card'*/
END
END
IF p = 0 & a = 0 /* No special chars, use remainder
*/
THEN PARSE VAR pattern needle
newpattern
pattern = newpattern /* Update pattern for next pass
*/
pos = POS(needle,haystack) /* Look for this pattern
*/
IF pos = 0 THEN RETURN 0 /* Not found, outta here
*/
IF pos > 1 THEN /* Found, but not at start-of-text
*/
IF wild <> 1 /* Wild card char in effect?
*/
THEN RETURN 0 /* No wild card, outta here
*/
wild = 0 /* Reset wild card character
*/
len = LENGTH(haystack)-LENGTH(needle)-pos
+1
haystack = RIGHT(haystack,len) /* Remove data that we just
scanned*/
IF haystack = ''
THEN
DO /* No more data to scan...
*/
IF pattern = '' /* Out of pattern as well?
*/
THEN RETURN 1 /* Yes, that's a match.
*/
IF pattern = '*' /* Did pattern end with wild card?
*/
THEN RETURN 1 /* Yes, that's a match too.
*/
RETURN 0 /* Else, sorry, not a match.
*/
END
END /* Ran out of pattern to scan
*/
IF wild = 1 /* Did pattern end with wild card?
*/
THEN RETURN 1 /* Yes, that's a match.
*/
ELSE RETURN 0 /* Else, sorry, not a match.
*/
/*-End of PATMATCH function-------------------------------------------
*/
|
|
0
|
|
|
|
Reply
|
Captain
|
9/14/2009 2:46:57 PM
|
|
Thanks a lot for your post! I ran a test with both options and it
appears that parse makes it somewhat slower though only by a slight
margin.
19:07:19
elapsed:11.872000 - matched:300000
19:07:31
elapsed:8.199000 - matched:300000
19:07:39
I ran with 6 pattern matches for each each cycle and executed that
cycle 100,000 times which resulted in 300,000 matches. The first
execution is using your code and the second uses mine.
Again, thanks a lot!
|
|
0
|
|
|
|
Reply
|
regli
|
9/16/2009 2:19:40 AM
|
|
A perfect example of 'platform independent' not being the
same as 'platform efficient'!
Run the same test on a mainframe, in particular VM/CMS, and
measure cpu cycles and Parse will probably win by a
significant amount. That's because the original VM Rexx was
implemented with the mainframe instruction set and Parse
was optimized for efficiency, according to Mike.
Les (Change Arabic to Roman to email me)
regli wrote:
> Thanks a lot for your post! I ran a test with both options and it
> appears that parse makes it somewhat slower though only by a slight
> margin.
>
> 19:07:19
> elapsed:11.872000 - matched:300000
> 19:07:31
> elapsed:8.199000 - matched:300000
> 19:07:39
>
> I ran with 6 pattern matches for each each cycle and executed that
> cycle 100,000 times which resulted in 300,000 matches. The first
> execution is using your code and the second uses mine.
>
> Again, thanks a lot!
|
|
0
|
|
|
|
Reply
|
LesK
|
9/16/2009 10:31:24 PM
|
|
On Sep 16, 3:31=A0pm, LesK <5mr...@tampabay.rr.com> wrote:
> A perfect example of 'platform independent' not being the
> same as 'platform efficient'!
> Run the same test on a mainframe, in particular VM/CMS, and
> measure cpu cycles and Parse will probably win by a
> significant amount. That's because the original VM Rexx was
> implemented with the mainframe instruction set and Parse
> was optimized for efficiency, according to Mike.
>
> Les =A0 =A0 =A0 =A0 =A0 =A0 =A0 (Change Arabic to Roman to email me)
>
> regli wrote:
> > Thanks a lot for your post! =A0I ran a test with both options and it
> > appears that parse makes it somewhat slower though only by a slight
> > margin.
>
> > 19:07:19
> > elapsed:11.872000 - matched:300000
> > 19:07:31
> > elapsed:8.199000 - matched:300000
> > 19:07:39
>
> > I ran with 6 pattern matches for each each cycle and executed that
> > cycle 100,000 times which resulted in 300,000 matches. The first
> > execution is using your code and the second uses mine.
>
> > Again, thanks a lot!
>
>
You are right but I'd still like to see somebody running it on VM.
I'll ask for that favor in another forum and report back.
|
|
0
|
|
|
|
Reply
|
regli
|
9/17/2009 3:59:20 AM
|
|
As promised here are the results from z/OS 1.9:
Solution 1:
TOTAL TCB CPU TIME= .89 TOTAL ELAPSED TIME= 2.0
09:39:20
elapsed:125.018178 - matched:30000
09:41:25
Solution 2:
TOTAL TCB CPU TIME= 1.16 TOTAL ELAPSED TIME= 1.8
09:42:05
elapsed:110.321300 - matched:30000
09:43:55
It shows that the parse solution gained a tremendous amount and won
the contest on the mainframe as I believe that in that scenario CPU
time is more telling that elapse time.
|
|
0
|
|
|
|
Reply
|
regli
|
9/18/2009 5:09:51 PM
|
|
On 16 Sep, 23:31, LesK <5mr...@tampabay.rr.com> wrote:
> A perfect example of 'platform independent' not being the
> same as 'platform efficient'!
> Run the same test on a mainframe, in particular VM/CMS, and
> measure cpu cycles and Parse will probably win by a
> significant amount. That's because the original VM Rexx was
> implemented with the mainframe instruction set and Parse
> was optimized for efficiency, according to Mike.
>
> Les =A0 =A0 =A0 =A0 =A0 =A0 =A0 (Change Arabic to Roman to email me)
Ahh but if it was on VM I would be using the PIPATTERN extension for
CMS/TSO PIPELINES to match the string.
|
|
0
|
|
|
|
Reply
|
Captain
|
9/18/2009 8:58:07 PM
|
|
Captain Paralytic wrote:
> Ahh but if it was on VM I would be using the PIPATTERN extension for
> CMS/TSO PIPELINES to match the string.
What? Not one of the xxGREP stages?
�R
|
|
0
|
|
|
|
Reply
|
Glenn
|
9/18/2009 10:16:24 PM
|
|
But then you couldn't compare apples to apples! The
original requirement was for platform independence.
Les (Change Arabic to Roman to email me)
Captain Paralytic wrote:
> On 16 Sep, 23:31, LesK <5mr...@tampabay.rr.com> wrote:
>> A perfect example of 'platform independent' not being the
>> same as 'platform efficient'!
>> Run the same test on a mainframe, in particular VM/CMS, and
>> measure cpu cycles and Parse will probably win by a
>> significant amount. That's because the original VM Rexx was
>> implemented with the mainframe instruction set and Parse
>> was optimized for efficiency, according to Mike.
>>
>> Les (Change Arabic to Roman to email me)
>
> Ahh but if it was on VM I would be using the PIPATTERN extension for
> CMS/TSO PIPELINES to match the string.
|
|
0
|
|
|
|
Reply
|
LesK
|
9/19/2009 6:02:43 AM
|
|
It would be interesting to see the two Pipe solutions and
measurements :-)
Les (Change Arabic to Roman to email me)
Glenn Knickerbocker wrote:
> Captain Paralytic wrote:
>> Ahh but if it was on VM I would be using the PIPATTERN extension for
>> CMS/TSO PIPELINES to match the string.
>
> What? Not one of the xxGREP stages?
>
> �R
|
|
0
|
|
|
|
Reply
|
LesK
|
9/19/2009 6:05:49 AM
|
|
On 18 Sep, 23:16, Glenn Knickerbocker <N...@bestweb.net> wrote:
> Captain Paralytic wrote:
> > Ahh but if it was on VM I would be using the PIPATTERN extension for
> > CMS/TSO PIPELINES to match the string.
>
> What? =A0Not one of the xxGREP stages?
>
> =ACR
I don't have access to the latest VM PIPLEINEs anymore :-(
Nearest to grep that was available when I had the access was PIPATTERN.
|
|
0
|
|
|
|
Reply
|
Captain
|
9/19/2009 12:33:21 PM
|
|
|
13 Replies
357 Views
(page loaded in 0.195 seconds)
|