A hash or array of regexp's?

  • Follow


I often find myself with a list of things that I'm searching for.  And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example.  Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute.  Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine.  Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled.  Often they are
true regexp's, matching variable repeats/patterns.  This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad.  But it's not quite the same as a hash lookup.  And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

So, is there a common perl idiom for dealing with this situation?
Maybe a CPAN module?

Tim.

0
Reply shoppa (94) 3/28/2005 2:31:03 PM

"Tim Shoppa" <shoppa@trailing-edge.com> wrote:
> I often find myself with a list of things that I'm searching for.  And
> for each of the things I'm searching for, there's an action I want to
> do.
>
> Sometimes the "search for" pattern is just the first four characters in
> the line, for example.  Here things are easy: I build a hash with the
> key being the four-character pattern, and the value being the
> subroutine to execute.  Works very nicely: get each line, use a
> substr() to extract the first four characters, look them up in the
> hash, and execute the correct subroutine.  Very quick, very fast, very
> idiomatic.
>
> But other times the patterns are not so easily handled.  Often they are
> true regexp's, matching variable repeats/patterns.  This of course can
> be handled with if matches and blocks to do the actions, but this
> screams out to me as something that I ought to be able to handle using
> a data structure which is something like a hash, using regexp's as
> keys.
>
> Pages 193/194 of the Camel book reveal how to loop over a bunch of
> precompiled regexp's, using qr// to precompile the regexp's, and this
> isn't bad.  But it's not quite the same as a hash lookup.  And it seems
> to me that there ought to be an idiom, maybe a CPAN module, that makes
> the whole operation look more like a hash lookup, because that's how I
> think of it in my head, even though I know that regexp's aren't really
> as quick or efficient as simple keys.

Also, any given string can match many different regexes, while there is
exactly one hash key it can match.  Trying to munge such a situation into a
hash-like idiom seems very misleading and just asking for trouble.

I'd just use an array of arrays, with each inner array being of length 2,
a regex/action pair.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB
0
Reply xhoster 3/28/2005 3:46:58 PM


* Tim Shoppa schrieb:

> I often find myself with a list of things that I'm searching for.  And
> for each of the things I'm searching for, there's an action I want to
> do.
> 
> Sometimes the "search for" pattern is just the first four characters in
> the line, for example.  Here things are easy: I build a hash with the
> key being the four-character pattern, and the value being the
> subroutine to execute.  Works very nicely: get each line, use a
> substr() to extract the first four characters, look them up in the
> hash, and execute the correct subroutine.  Very quick, very fast, very
> idiomatic.
> 
> But other times the patterns are not so easily handled.  Often they are
> true regexp's, matching variable repeats/patterns.  This of course can
> be handled with if matches and blocks to do the actions, but this
> screams out to me as something that I ought to be able to handle using
> a data structure which is something like a hash, using regexp's as
> keys.
> 
> So, is there a common perl idiom for dealing with this situation?

I would do this with an array containing a regex as each second element
and the callback in the following one, then iterating over this array
while skipping the callback elements.

    #!/usr/bin/perl -w
    use strict;

    my @array = (
        qr/(line\s(\d)\2)/ => sub { print "match: $1" },
        # ...
    );

    while ( <DATA> ) {
        for my $i ( 0 .. @array-1 ) {
            next if $i % 2;          # skip if odd
            my( $re, $sub ) = @array[ $i, $i+1 ];
            $sub->() if $_ =~ $re;   # callback
        }
    }
    __DATA__
    line 10
    line 11
    line 12
    

> 
> Maybe a CPAN module?

The Modul Tie::HashRef is moving around the problem of stringified hash
keys. Perhaps it accepts a reference to a regex as keys -- the doc isn't
talking about and neither I checked it out yet.

regards,
fabian
0
Reply Fabian 3/29/2005 12:26:18 AM

Fabian Pikowski wrote:
> The Modul Tie::HashRef is moving around the problem

Thanks for the tip, it's not only a tied hash but also a useful
object-oriented approach to looking for matches.  It takes "qr//" forms
directly as the key, no need stringify/destringify.  And to answer the
other reply, the approach taken ("first match") works fine for my
purposes.

I know it's not really a hash (with all the efficiencies that would be
implied if it was) but I like to think in terms of a hash, and
Tie::HashRef works wonderfully for this.

Tim.

0
Reply Tim 3/29/2005 2:12:43 PM

Tim Shoppa wrote:
> And to answer the
> other reply, the approach taken ("first match") works fine for my
> purposes.

Be careful when iterating over all hash keys without sorting them.
The order in which keys are returned varies from one run to the next.

If there are several possible matches, and they are not applied
in a specific order, then which one of them is "first" becomes
nondeterministic.

	-Joe
0
Reply Joe 3/31/2005 11:32:08 AM

Tim Shoppa <shoppa@trailing-edge.com> wrote in comp.lang.perl.misc:
> Fabian Pikowski wrote:
> > The Modul Tie::HashRef is moving around the problem
> 
> Thanks for the tip, it's not only a tied hash but also a useful
> object-oriented approach to looking for matches.  It takes "qr//" forms
> directly as the key, no need stringify/destringify.  And to answer the
> other reply, the approach taken ("first match") works fine for my
> purposes.
> 
> I know it's not really a hash (with all the efficiencies that would be
> implied if it was) but I like to think in terms of a hash, and
> Tie::HashRef works wonderfully for this.

As an alternative, you can embed the various actions inside a regex
using the (?{ CODE }) construct.  Example:

    my $re = qr/
        aaa (?{ print "first case\n" }) |
        bbb (?{ print "second case\n" })
    /x;

    /$re/ for qw( gaga-aaa-gugu nothing-here baba-bbb-bubu);

I'm not really recommending this unless special circumstances make it
attractive.  For one, the (?{ CODE }) construct is experimental and
has actual (scoping) issues.  Also, it is an invitation to build one
big-ass regex for all alternatives.  Good coding practice is to
split large regexes, not to combine them.

Anno
0
Reply anno4000 3/31/2005 11:36:42 AM

5 Replies
85 Views

(page loaded in 0.302 seconds)


Reply: