f



What is regular about regular expressions nowadays?

The language {ww | w is alpha} has been shown to be context-sensitive.
But the perl regular expression /(\w*)\1/ will recognize it. There are
constructs supported by some regex engines (.NET, for example) that
allow sections to be tagged and

Why do we insist on calling these "regular"? Anything with
back-references or tagging of any sort can't be expressed as DFAs or
NFAs because one is not supposed to be able to keep a memory of the
input. Still, all books that talk about regex engines talk about NFAs
and backtracking. Something like {ww} requires an equivalent LBA 
(linear bounded automaton), no?
Is the terminology all mixed up?

It seems to me that people are stuffing more and more features into
regular expression. Still, I suspect it would be objectionable to
modify the perl regex engine to accept an LR(k) grammar, for example,
so what kind of changes to a regex syntax are deemed acceptable?
Something that doesn't support recursion?

Related question: Why are "synchronized regular expressions" regular?

Befuddled minds want to know.

-Sriram
0
1/15/2004 5:11:47 PM
comp.theory 5139 articles. 1 followers. marty.musatov (1143) is leader. Post Follow

10 Replies
1154 Views

Similar Articles

[PageSpeed] 38

Sriram wrote:

> The language {ww | w is alpha} has been shown to be context-sensitive.
> But the perl regular expression /(\w*)\1/ will recognize it. There are
> constructs supported by some regex engines (.NET, for example) that
> allow sections to be tagged and
> 
> Why do we insist on calling these "regular"? Anything with
> back-references or tagging of any sort can't be expressed as DFAs or
> NFAs because one is not supposed to be able to keep a memory of the
> input. Still, all books that talk about regex engines talk about NFAs
> and backtracking. Something like {ww} requires an equivalent LBA 
> (linear bounded automaton), no?
> Is the terminology all mixed up?
> 
> It seems to me that people are stuffing more and more features into
> regular expression. Still, I suspect it would be objectionable to
> modify the perl regex engine to accept an LR(k) grammar, for example,
> so what kind of changes to a regex syntax are deemed acceptable?
> Something that doesn't support recursion?
> 
> Related question: Why are "synchronized regular expressions" regular?
> 
> Befuddled minds want to know.

Simple answer: they arn't regular.

More complex answer: it was trivial to add that stuff to a typical
backtracking RE engine, so it was done.  It was still called an RE
engine due to laziness.

BTW, take a look at the plans for the Pattern Matching engine for perl 6
( http://dev.perl.org/perl6/apocalypse/A06.html ).  But, ther're still
calling them "regular expressions" (-:

-- 
mark@biggar.org
mark.a.biggar@comcast.net

0
mark8231 (28)
1/16/2004 3:32:11 AM
Mark A. Biggar wrote:

> ( http://dev.perl.org/perl6/apocalypse/A06.html ).  

Oops, that should be http://dev.perl.org/perl6/apocalypse/A05.html
-- 
mark@biggar.org
mark.a.biggar@comcast.net

0
mark8231 (28)
1/16/2004 3:39:02 AM
"Sriram" <sriram@malhar.net> wrote:

> Befuddled minds want to know.

It's been a couple decades, but check the disclaimers
for those parsers.  I seem to recall the word "exponential"
sneaking in somewhere, might have been space, might have been
time, for worst case behavior, but as I said, it's been a
couple of decades.

We tend to use lots of tools whose worst case behavior would
wreck our lives, because that case is too rare to encounter in
one tool lifetime without deliberately constructing it.

xanthian.


-- 
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
0
xanthian (693)
1/16/2004 3:58:51 AM
"Kent Paul Dolan" <xanthian@well.com> wrote in message
news:acaa990a77438a36c2b80392f901353a.48257@mygate.mailgate.org...
> "Sriram" <sriram@malhar.net> wrote:
>
> > Befuddled minds want to know.
>
> It's been a couple decades, but check the disclaimers
> for those parsers.  I seem to recall the word "exponential"
> sneaking in somewhere, might have been space, might have been
> time, for worst case behavior, but as I said, it's been a
> couple of decades.
>

There is the concept of "regular expressions with exponentiation".  ie:  If
R is a regular expression, and k in N, then R^k = R concatenated with itself
k times.  I have only seen exponentiation used in the canonical
EXSPACE-complete language EQ_REX-E = {<Q,R> | Q and R are equivalent regular
expressions with exponentiation}.  So I would guess what you saw may not
have had anything to do with exponential time algorithms.

> We tend to use lots of tools whose worst case behavior would
> wreck our lives, because that case is too rare to encounter in
> one tool lifetime without deliberately constructing it.
>

I would guess that this is not true, since the algorithm writer has no idea
what possible regular expressions may be used.  ie: it is rare that
language/api designers would use an exponential time algorithm and not at
least mention this in the docs.  Especially for such widely used apis as
found in Java, Perl and .NET since people routinely use such languages for
mission-critical apps where a computation that takes days (or more) to
complete can mean the loss of millions of dollars.

And although I'm sure you know this - even if the alg is exponential only in
what may seem like contrived cases - the alg is still defined as
exponential.  Besides, what may seem like a contrived case to you or I, may
be an everyday occurence in certain applications.

What you mentioned, however, is common place in randomized algorithms (ie:
randomized quicksort), but in the "one-in-a-million" chance the alg hits its
worst case, its worst case is still polynomial (at least in the randomized
algs I've dealt with).

Same thing with probabalistic algs - but in this case instead of being
exponential in the worst case, the alg may actually give an incorrect answer
(ie: accepts or rejects incorrectly - see the complexity class BPP for a
simple example).



l8r, Mike N. Christoff



0
mchristoff (248)
1/16/2004 5:34:04 AM
"Michael N. Christoff" <mchristoff@sympatico.caREMOVETHIS> wrote:

> I would guess that this is not true, since the algorithm writer has no idea
> what possible regular expressions may be used.  ie: it is rare that
> language/api designers would use an exponential time algorithm and not at
> least mention this in the docs.

Michael, don't try to think, you'll injure yourself; put me in your
killfile and skip the pain.

From "man perlre":

               Consider how the
               pattern above detects no-match on
               "((()aaaaaaaaaaaaaaaaaa" in several seconds, but
               that each extra letter doubles this time.  This
               exponential performance will make it appear that
               your program has hung.

Looks like a "warning in the docs" about "exponential whosis" to me,
but rather than checking, you needed to pull an answer, let's be nice,
"out of the air".

No, I will not indulge in another interminable debate about how you
were right all along; I think for not having read that for a decade,
and being a total space-case, I did rather well at remembering it.

You would have done rather well remembering that I usually do, and
checking your facts before spouting off. You were early on your five
month "making a fool of yourself trying to put down Kent" cycle, by
almost five months to the day; was that yesterday, or the day before
when you last did this?

Clues, they're not just for smart people any more.

xanthian.


-- 
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
0
xanthian (693)
1/16/2004 7:22:54 AM
"Kent Paul Dolan" <xanthian@well.com> wrote in message
news:a724c710a76ee5c3d56a79bb92bfde0c.48257@mygate.mailgate.org...
> "Michael N. Christoff" <mchristoff@sympatico.caREMOVETHIS> wrote:
>
> > I would guess that this is not true, since the algorithm writer has no
idea
> > what possible regular expressions may be used.  ie: it is rare that
> > language/api designers would use an exponential time algorithm and not
at
> > least mention this in the docs.
>
> Michael, don't try to think, you'll injure yourself; put me in your
> killfile and skip the pain.
>
> From "man perlre":
>
>                Consider how the
>                pattern above detects no-match on
>                "((()aaaaaaaaaaaaaaaaaa" in several seconds, but
>                that each extra letter doubles this time.  This
>                exponential performance will make it appear that
>                your program has hung.
>
> Looks like a "warning in the docs"

The only way I would have been incorrect is if they did it and did not
mention it in the docs, which they did.  So what's the problem?

> about "exponential whosis" to me,
> but rather than checking,

Did you do any checking before your original reply?  Or should only people
other than yourself be able to make a _guess_ without checking?  I believe I
mentioned that I was making a guess at least twice.

> being a total space-case

Well at least you got that right.



l8r, Mike N. Christoff



0
mchristoff (248)
1/16/2004 7:46:26 AM
"Michael N. Christoff" <mchristoff@sympatico.caREMOVETHIS> wrote:

> The only way I would have been incorrect is if they did it and did not
> mention it in the docs, which they did.  So what's the problem?

You missed the part about not trying to think to avoid injury, Michael.
Other people can go back and read the full exchange, consider that you
have been trying this stunt for a few whole _years_ now, put two and two
right adjacent to one another to come up with the right answer, and,
there you are with egg on your face again.

Killfile, Michael, it's your only hope.

xanthian.


-- 
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
0
xanthian (693)
1/16/2004 8:12:09 AM
"Michael N. Christoff" <mchristoff@sympatico.caREMOVETHIS> wrote:


> The only way I would have been incorrect is if they did it and did not
> mention it in the docs, which they did.  So what's the problem?

Mike, that's an infant's answer.  Everyone
in front of whom you are saying this can
go back and read the entire conversation,
factor in that you've been pulling this
same stunt for whole _years_ now, read your
agenda as well as your words, put two right
next to two and find the obvious answer.

I said:

Don't try to think, you'll hurt yourself.

You just don't take advice well, you tried
to think, and your reputation has another
huge blemish.

I said;

Put me in your killfile, it's your only hope.

Are you completely incapable of being educated,
or is it just an act you do on Usenet and in
my email box?

Good advice is being given. I'm not your enemy,
you just waste my time with this game.

I'm not the world's best known most patient
tolerator of fools, so you get double what you
give simply for making the effort, yet don't
know how to admit defeat or avoid starting.

That can lead to a life of perpetual pain, like
mine.  Don't insist on going there, with me or
with anyone else, out of pride and a need to
"win"; other choices exist; pick one.

Take the advice, your life will be better for it.

xanthian.


-- 
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
0
xanthian (693)
1/16/2004 9:28:50 AM
"Sriram" <sriram@malhar.net> wrote in message
news:2c18ce3c.0401150911.686a508b@posting.google.com...
> The language {ww | w is alpha} has been shown to be context-sensitive.
> But the perl regular expression /(\w*)\1/ will recognize it. There are
> constructs supported by some regex engines (.NET, for example) that
> allow sections to be tagged and
>
> Why do we insist on calling these "regular"? Anything with
> back-references or tagging of any sort can't be expressed as DFAs or
> NFAs because one is not supposed to be able to keep a memory of the
> input. Still, all books that talk about regex engines talk about NFAs
> and backtracking. Something like {ww} requires an equivalent LBA
> (linear bounded automaton), no?

I haven't checked, but if your language is in A = {CSL}\{CFL} then yes - it
requires at least an LBA.

> Is the terminology all mixed up?
>
> It seems to me that people are stuffing more and more features into
> regular expression.

> Still, I suspect it would be objectionable to
> modify the perl regex engine to accept an LR(k) grammar, for example,
> so what kind of changes to a regex syntax are deemed acceptable?
> Something that doesn't support recursion?
>

Well if perl regex can require exponential time then it would seem that
almost anything goes (at least up to NSPACE(n) it seems [or maybe even
EXP(?)]).  Note that, at least according to Hopcroft and Ullman, finding a
'real-world' language that isn't context sensitive is extrememly difficult
which means CSLs are extremely powerful.  ie: they can recognize the
language {a^n | n is prime} which NPDAs are not even able to recognize.  I
also agree with Mark Biggar's comments.



l8r, Mike N. Christoff



0
mchristoff (248)
1/16/2004 1:36:38 PM
"Michael N. Christoff" <mchristoff@sympatico.caREMOVETHIS> wrote in message
news:ULRNb.5681$c1.898557@news20.bellglobal.com...
>
> "Sriram" <sriram@malhar.net> wrote in message
> news:2c18ce3c.0401150911.686a508b@posting.google.com...
> > The language {ww | w is alpha} has been shown to be context-sensitive.
> > But the perl regular expression /(\w*)\1/ will recognize it. There are
> > constructs supported by some regex engines (.NET, for example) that
> > allow sections to be tagged and
> >
> > Why do we insist on calling these "regular"? Anything with
> > back-references or tagging of any sort can't be expressed as DFAs or
> > NFAs because one is not supposed to be able to keep a memory of the
> > input. Still, all books that talk about regex engines talk about NFAs
> > and backtracking. Something like {ww} requires an equivalent LBA
> > (linear bounded automaton), no?
>
> I haven't checked, but if your language is in A = {CSL}\{CFL} then yes -
it
> requires at least an LBA.
>
> > Is the terminology all mixed up?
> >
> > It seems to me that people are stuffing more and more features into
> > regular expression.
>
> > Still, I suspect it would be objectionable to
> > modify the perl regex engine to accept an LR(k) grammar, for example,
> > so what kind of changes to a regex syntax are deemed acceptable?
> > Something that doesn't support recursion?
> >
>
> Well if perl regex can require exponential time then it would seem that
> almost anything goes (at least up to NSPACE(n) it seems [or maybe even
> EXP(?)]).  Note that, at least according to Hopcroft and Ullman, finding a
> 'real-world' language that isn't context sensitive is extrememly difficult
> which means CSLs are extremely powerful.  ie: they can recognize the
> language {a^n | n is prime} which NPDAs are not even able to recognize.  I
> also agree with Mark Biggar's comments.
>

By the way, the exponentiality of the example in the perlre docs on the
surface seems not to come from context sensitive string matching but the
fact the pattern matcher needs to check close to the power set of the set of
characters in the source text.



l8r, Mike N. Christoff




0
mchristoff (248)
1/16/2004 2:11:33 PM
Reply: