f



regex in javascript

Hi folks,

I'm trying to learn and understand regex in javascript for web form
validation. In trying to write a piece of script to validate an email
address I came up with the following:


   function isEmail(input)
   {
     var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
     if (!re.exec(input))
     {
       return false
     }
   }

This is then referred to as :

   if (!isEmail(document.form.mailfrom.value))
   {
     alert("This does not look like an email address")
   }

etc.

Unfortunately it bombs on absolutely everything. Can anyone suggest why?

As an aside, is there any truly decent help anywhere on regex? I've got
two manuals, one on perl, the other on javascript, both of which touch on
regex - the perl book is the better of the two for this - but neither are
very good.

Many thanks,

Dave

-- 
Dave Stratford    ZFCA
http://www.argonet.co.uk/users/daves
Hexagon Systems Limited - Experts in VME systems development

0
daves (84)
9/29/2004 10:38:19 PM
comp.sys.acorn.programmer 2499 articles. 0 followers. Post Follow

13 Replies
4025 Views

Similar Articles

[PageSpeed] 10

Dave Stratford <daves@argonet.co.uk>:
> As an aside, is there any truly decent help anywhere on regex? I've got
> two manuals, one on perl, the other on javascript, both of which touch on
> regex - the perl book is the better of the two for this - but neither are
> very good.

There's an O'Reilly book called _Mastering Regular Expressions_:
<URL:http://regex.info>

The regex manpage is comprehensive, but rather dense:
<URL:http://arglist.com/regex/regex7.html>

Of course, there's always:
<URL:http://dmoz.org/Computers/Programming/Languages/Regular_Expressions/>

HTH.

b.

-- 
Ben Shimmin (bas@bas.me.uk)                            <URL:http://bas.me.uk/>
                                finger gpg@bas.me.uk | tail -30 | gpg --import
0
bas7517 (90)
9/29/2004 11:48:09 PM
On Wed, 29 Sep 2004, Dave Stratford wrote:

> Hi folks,
>
> I'm trying to learn and understand regex in javascript for web form
> validation. In trying to write a piece of script to validate an email
> address I came up with the following:
>
>
>    function isEmail(input)
>    {
>      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/

This really isn't the right forum for this, but...

I'm going to describe each bit of the text you have here so that those
without general Regular Expression knowledge might follow what I'm
saying...

^ means 'start of line'
$ means 'end of line'
\w is a word character (a-z, A-Z, 0-9 or _)
* means 'any number of the thing before'
+ means 'one or more of the thing before'
\. means 'a . character'
\- means 'a - character'
() around something means 'group this lot together'
[] around something means 'any of the characters inside here can match'

So what you've written is...

1 Start of line (^)
2 A word character (\w)
3 Any number of... (* - notice that I'm jumping forward through the group
                  to describe this)
  3.1 . or - ([\.\-])
  3.2 A word character (\w)
4 An @ character (@)
5 A word character (\w)
6 One or more of... (+ - again I'm jumping forward to describe this)
  6.1 . or - ([\.\-])
  6.2 A word character (\w)
7 End of line ($)

So, if you want to check that your email address matches you run through
the expression. Let's take my email address 'gerph@gerph.org'.

1 the start of the line
2 'g' matches
3 (we repeatedly do 3.1 and 3.2 until we fail)
   3.1 'e' doesn't match - it's not a . or a -
  We've failed, so 3 is now complete - it says 'any number of', which
  includes no matches.
4 'e' doesn't match - it's not an @ character

We've failed and there's no other retries that we can do so the string
doesn't match your regular expression.

Why have I explained it that way rather than just telling you what's wrong
? Because by following it through you can see where the problem lies. You
expected 2 and 3 to match 'everything on the left of the @', and it was in
3 that the expected matches didn't happen. You wanted 3 to match all of
'erph' from my email address.

So we look at 3.1 because that's where our failure happened. You wanted .
or - which is fine, but you didn't mean that if you wanted the string.
Clause 3 is actually asking for word characters alternating with a . or a
-. For example 'g.e-r-p.h' would match clauses 2 and 3.

What you wanted was 'any word character or . or -' [1]. Since you're
wanting a selection of characters, you want to put it in []'s. And the
characters you want are the '.', '-' and any word character, which are
represented by '\.', '\-', and '\w' respectively. In addition, you want
'any number of these', so you want to follow it by the '*'. So the entire
clause 3 becomes '[\.\-\w]*'.

With this in mind, you can also see that clause 6 is wrong - it needs to
be the same thing, but only 'one or more of...', so '[\.\-\w]+'

And so we come to the final regular expression - we can just dump the bits
that we've changed in place of the clauses that were wrong.

   /^\w[\.\-\w]*@\w[\.\-\w]+$/

As you can see from this, you were very close, but just got the groupings
wrong.


I heartily recommend O'Reilly's 'JavaScript, The Definitive Guide' for
general JS usage (and of course the ECMAScript specification itself if you
want a more specific 'definitive' guide :-) ); for regular expressions,
the POSIX specification is quite handy.



[1] Actually, I'm *assuming* that's what you want. My personal belief is
that you don't want that, because that's not the specification for an
email address - you want an 'addr-spec', as defined in RFC2822. This is
much more complex.

In fact, so complex that I really don't think I can quickly knock up a
regular expression to describe it. Consider that the email address might,
validly, be '"Justin Fletcher"@gerph.org', and indeed that's not the end
of the story, because '(Home address) "Justin Fletcher"@gerph.org (private
address)' is also a valid email address.

-- 
Gerph <http://gerph.org/>
.... All my life I've been waiting for you to bring a fairy tale my way.
0
gerph (209)
9/30/2004 1:56:47 AM
In message <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>
          Justin Fletcher <gerph@gerph.org> wrote:

> On Wed, 29 Sep 2004, Dave Stratford wrote:
> 
> > Hi folks,
> >
> > I'm trying to learn and understand regex in javascript for web form
> > validation. In trying to write a piece of script to validate an email
> > address I came up with the following:
> >
> >
> >    function isEmail(input)
> >    {
> >      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
> 
> And so we come to the final regular expression - we can just dump the bits
> that we've changed in place of the clauses that were wrong.
> 
>    /^\w[\.\-\w]*@\w[\.\-\w]+$/
> 
> As you can see from this, you were very close, but just got the groupings
> wrong.

Even this is not quite enough if you want to ensure that the address has
a FQDN, as the above regex will pass "ab@cd". To ensure a FQDN, e.g.
"ab@cd.ef" or "ab@cd.ef.gh", you need:

  /^\w[\.\-\w]*@\w[\.\-\w]*\.\w+$/

Alan

-- 
RISC OS - you know it makes cents
0
spamhater1 (1060)
9/30/2004 12:11:20 PM
In message <4cf63bd1b1daves@argonet.co.uk>, Dave Stratford
<daves@argonet.co.uk> wrote:

> Hi folks,
> 
> I'm trying to learn and understand regex in javascript for web form
> validation. In trying to write a piece of script to validate an email
> address I came up with the following:
> 
> 
>    function isEmail(input)
>    {
>      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
>      if (!re.exec(input))
>      {
>        return false
>      }
>    }

No return statement is executed when the RE matches, so the function returns
undefined.

-- 
Member AFFS, WYLUG, SWP (UK), ANL, Leeds SA, Leeds Anti-war coalition
OpenPGP key fingerprint: D0A6 F403 9745 CED4 6B3B  94CC 8D74 8FC9 9F7F CFE4
No to software patents!    No to DRM/EUCD - hands off our computers!
0
9/30/2004 2:39:33 PM
In article <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>,
   Justin Fletcher <gerph@gerph.org> wrote:

>   6.2 A word character (\w)

The string after the '@' - the domain name - cannot contain an underscore,
so is not the word character inappropriate here (and for the subsequent
optional strings)?

I am aware of this as I had to drop Pic_Index's underscore for that domain
name.

John

-- 
John Williams, Wirral, Merseyside, UK - no attachments to these addresses!
Non-RISC OS posters change user to johnrwilliams or put 'risc' in subject
for reliable contact! Who is John Williams? http://www.picindex.info/author/ 
0
UCEbin (2771)
9/30/2004 2:52:32 PM
Alan Wrigley wrote:

> Even this is not quite enough if you want to ensure that the address has
> a FQDN, as the above regex will pass "ab@cd". To ensure a FQDN, e.g.
> "ab@cd.ef" or "ab@cd.ef.gh", you need:

Of course, "Mastering Regular Expressions" has the definitive email 
address regex - all 60 lines of it!

-- 
Jason Tribbeck

newsmaster9@tribbeck.com - 20K download limit - anything larger won't
be received.
0
newsmaster9 (144)
9/30/2004 5:22:52 PM
On Thu, 30 Sep 2004, Alan Wrigley wrote:

> In message <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>
>           Justin Fletcher <gerph@gerph.org> wrote:
>
> > On Wed, 29 Sep 2004, Dave Stratford wrote:
> >
> > > Hi folks,
> > >
> > > I'm trying to learn and understand regex in javascript for web form
> > > validation. In trying to write a piece of script to validate an email
> > > address I came up with the following:
> > >
> > >
> > >    function isEmail(input)
> > >    {
> > >      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
> >
> > And so we come to the final regular expression - we can just dump the bits
> > that we've changed in place of the clauses that were wrong.
> >
> >    /^\w[\.\-\w]*@\w[\.\-\w]+$/
> >
> > As you can see from this, you were very close, but just got the groupings
> > wrong.
>
> Even this is not quite enough if you want to ensure that the address has
> a FQDN, as the above regex will pass "ab@cd". To ensure a FQDN, e.g.
> "ab@cd.ef" or "ab@cd.ef.gh", you need:
>
>   /^\w[\.\-\w]*@\w[\.\-\w]*\.\w+$/

I meant 'close to what he'd intended'; not close to what's required for an
email address. As I mentioned in a footnote, RFC2822 (cute thing on the
part of the RFC editor to give it the same end digits as the previous
standard - RFC822 - I think) the full specification is pretty hairy.

-- 
Gerph <http://gerph.org/>
.... Tearing me away from the quiet; the silence of my soul.
0
gerph (209)
9/30/2004 6:57:05 PM
In article <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>,
   Justin Fletcher <gerph@gerph.org> wrote:
> On Wed, 29 Sep 2004, Dave Stratford wrote:

> > Hi folks,
> >
> > I'm trying to learn and understand regex in javascript for web form
> > validation. In trying to write a piece of script to validate an email
> > address I came up with the following:
> >
> >
> >    function isEmail(input)
> >    {
> >      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/

> This really isn't the right forum for this, but...

[snip]

> And so we come to the final regular expression - we can just dump the
> bits that we've changed in place of the clauses that were wrong.

>    /^\w[\.\-\w]*@\w[\.\-\w]+$/

> As you can see from this, you were very close, but just got the groupings
> wrong.

Okay. I see what I did wrong now. Many many thanks.

FWIW I wasn't trying to confirm an absulutely perfectly valid email
address, just confirm that one entered into the web form looks
approximately correct.

Thanks,

Dave

-- 
Dave Stratford    ZFCA
http://www.argonet.co.uk/users/daves
Hexagon Systems Limited - Experts in VME systems development

0
daves (84)
9/30/2004 10:06:17 PM
In article <ad4086f64c.spamhater@keepyourfilthyspamtoyourself.co.uk>,
   Alan Wrigley <spamhater@keepyourfilthyspamtoyourself.co.uk> wrote:
> In message <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>
>           Justin Fletcher <gerph@gerph.org> wrote:

> > On Wed, 29 Sep 2004, Dave Stratford wrote:
> > 
> > > Hi folks,
> > >
> > > I'm trying to learn and understand regex in javascript for web form
> > > validation. In trying to write a piece of script to validate an email
> > > address I came up with the following:
> > >
> > >
> > >    function isEmail(input)
> > >    {
> > >      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
> > 
> > And so we come to the final regular expression - we can just dump the
> > bits that we've changed in place of the clauses that were wrong.
> > 
> >    /^\w[\.\-\w]*@\w[\.\-\w]+$/
> > 
> > As you can see from this, you were very close, but just got the
> > groupings wrong.

> Even this is not quite enough if you want to ensure that the address has
> a FQDN, as the above regex will pass "ab@cd". To ensure a FQDN, e.g.
> "ab@cd.ef" or "ab@cd.ef.gh", you need:

>   /^\w[\.\-\w]*@\w[\.\-\w]*\.\w+$/

Okay, I see what you've added, but in that case would it not also be more
correct to add a \w+ immediately before the # to prevent (eg) a.@b.c ?

Dave

-- 
Dave Stratford    ZFCA
http://www.argonet.co.uk/users/daves
Hexagon Systems Limited - Experts in VME systems development

0
daves (84)
9/30/2004 10:09:59 PM
In message <4cf69502f5UCEbin@tiscali.co.uk>
     "John Williams (News)" <UCEbin@tiscali.co.uk> wrote:

>In article <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>,
>   Justin Fletcher <gerph@gerph.org> wrote:
>
>>   6.2 A word character (\w)
>
>The string after the '@' - the domain name - cannot contain an underscore,
>so is not the word character inappropriate here (and for the subsequent
>optional strings)?
>
>I am aware of this as I had to drop Pic_Index's underscore for that domain
>name.

Odd. An underscore is a valid character in a domain name according 
to the standard (rfc2822). A dot atom string (word.word.word) after 
the @ has the same specification as a dot atom string before the @.
-- 
Colin
0
10/1/2004 7:39:15 AM
In message <ad4086f64c.spamhater@keepyourfilthyspamtoyourself.co.uk>
     Alan Wrigley <spamhater@keepyourfilthyspamtoyourself.co.uk> 
     wrote:

>In message <Pine.LNX.4.55.0409300225230.13641@buttercup.gerph.org>
>          Justin Fletcher <gerph@gerph.org> wrote:
>
>> On Wed, 29 Sep 2004, Dave Stratford wrote:
>> 
>> > Hi folks,
>> >
>> > I'm trying to learn and understand regex in javascript for web form
>> > validation. In trying to write a piece of script to validate an email
>> > address I came up with the following:
>> >
>> >
>> >    function isEmail(input)
>> >    {
>> >      var re =/^\w([\.\-]\w)*@\w([\.\-]\w)+$/
>> 
>> And so we come to the final regular expression - we can just dump the bits
>> that we've changed in place of the clauses that were wrong.
>> 
>>    /^\w[\.\-\w]*@\w[\.\-\w]+$/
>> 
>> As you can see from this, you were very close, but just got the groupings
>> wrong.
>
>Even this is not quite enough if you want to ensure that the address has
>a FQDN, as the above regex will pass "ab@cd". To ensure a FQDN, e.g.
>"ab@cd.ef" or "ab@cd.ef.gh", you need:
>
>  /^\w[\.\-\w]*@\w[\.\-\w]*\.\w+$/
>

According to RFC2822 ab@cd is a valid email address. Is FQDN 
something different?

-- 
Colin
0
10/1/2004 8:00:51 AM
In article <eb2df1f64c.colin@colin/granville.gmx.co.uk>,
   Colin Granville <colin.granville@gmx.co.uk> wrote:

> Odd. An underscore is a valid character in a domain name according 
> to the standard (rfc2822). A dot atom string (word.word.word) after 
> the @ has the same specification as a dot atom string before the @.

Perhaps the 'tester' routine on the sites I tried were faulty - but I note
that it is also considered so in this old (pre '.info') PHP expression from
elsewhere:

"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"

(that should now be {2,4} to cope with '.info', AFAICS)

John

-- 
John Williams, Wirral, Merseyside, UK - no attachments to these addresses!
Non-RISC OS posters change user to johnrwilliams or put 'risc' in subject
for reliable contact! Who is John Williams? http://www.picindex.info/author/ 
0
UCEbin (2771)
10/1/2004 9:03:40 AM
Colin Granville <colin.granville@gmx.co.uk>:

[...]

> According to RFC2822 ab@cd is a valid email address. Is FQDN 
> something different?

FQDN is the Fully Qualified Domain Name -- a hostname with its respective
domain name, up to the top-level domain name, included.

b.

-- 
Ben Shimmin (bas@bas.me.uk)                            <URL:http://bas.me.uk/>
                                finger gpg@bas.me.uk | tail -30 | gpg --import
0
bas7517 (90)
10/1/2004 3:40:49 PM
Reply: