Using regular expressions to split a string

  • Follow


Hi,

I have a quick question. Using regular expressions, is it possible to
split a string based on some special character (lets say &) but leave
the special character in if its preceeded by a \. so the string "this
is not split \& from this. & but this is" would produce two matches.
one would have the \& (which i suppose i will later have to replace
with the replace method).

I've got something along the lines of 

var PlayOptionsExpr = new RegExp("(.*)&(?!\\\\)") 

working, which will split if the ampersand is followed by a \. I was
wondering if it could be done with a preceding backslash. Is there any
regular expression command that can match a charcter if it is not
preceeded by a character?
0
Reply chaitanyag 8/25/2004 2:42:20 AM

In a land long ago, in a time far away
chaitanyag@hotmail.com (MENTAT) wrote:

> Hi,
[cut: finding \&]
> Is there any
> regular expression command that can match a charcter if it is not
> preceeded by a character?

Sure, [^a] for example will match any character that is not an 'a'.

So, [^\\]& should do it (unverified :)
If '&' can be at the beginning, use something like ^&|[^\\]&


Yours
P

-- 
Yours
P
0
Reply PatD 8/25/2004 8:06:28 AM


Try a lookbehind.  I'm not sure if they work in JavaScript.

http://www.tote-taste.de/X-Project/regex/printable.html

Shawn

chaitanyag@hotmail.com (MENTAT) wrote in message news:<5a286c02.0408241842.18f759d1@posting.google.com>...
> Hi,
> 
> I have a quick question. Using regular expressions, is it possible to
> split a string based on some special character (lets say &) but leave
> the special character in if its preceeded by a \. so the string "this
> is not split \& from this. & but this is" would produce two matches.
> one would have the \& (which i suppose i will later have to replace
> with the replace method).
> 
> I've got something along the lines of 
> 
> var PlayOptionsExpr = new RegExp("(.*)&(?!\\\\)") 
> 
> working, which will split if the ampersand is followed by a \. I was
> wondering if it could be done with a preceding backslash. Is there any
> regular expression command that can match a charcter if it is not
> preceeded by a character?
0
Reply shawn 8/25/2004 12:48:02 PM

shawn.milo@gmail.com (Shawn Milo) writes:

> Try a lookbehind.  I'm not sure if they work in JavaScript.

They don't.

/L
-- 
Lasse Reichstein Nielsen  -  lrn@hotpop.com
 DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
  'Faith without judgement merely degrades the spirit divine.'
0
Reply Lasse 8/25/2004 4:32:36 PM

That will match both & and \&. I want to match the first but not the
second. What i need is a look behind assertion. Which, as lasse said,
javascript doesn't handle. So I don't think it can be done.


PatD <dostert@pt.lu> wrote in message news:<0thoi05omjhg3t9r4kr9l3kncjalouk6d2@4ax.com>...
> In a land long ago, in a time far away
> chaitanyag@hotmail.com (MENTAT) wrote:
> 
> > Hi,
>  [cut: finding \&]
> > Is there any
> > regular expression command that can match a charcter if it is not
> > preceeded by a character?
> 
> Sure, [^a] for example will match any character that is not an 'a'.
> 
> So, [^\\]& should do it (unverified :)
> If '&' can be at the beginning, use something like ^&|[^\\]&
> 
> 
> Yours
> P
0
Reply chaitanyag 8/26/2004 7:10:18 AM

In a land long ago, in a time far away
chaitanyag@hotmail.com (MENTAT) wrote:

>PatD <dostert@pt.lu> wrote in message news:<0thoi05omjhg3t9r4kr9l3kncjalouk6d2@4ax.com>...
>> In a land long ago, in a time far away
>> chaitanyag@hotmail.com (MENTAT) wrote:
>> 
>> > Hi,
>>  [cut: finding \&]
>> > Is there any
>> > regular expression command that can match a charcter if it is not
>> > preceeded by a character?
>> 
>> Sure, [^a] for example will match any character that is not an 'a'.
>> 
>> So, [^\\]& should do it (unverified :)
> 
>That will match both & and \&.

Will it?
Hm..., have you tried to see what [^\\]& actually finds?


function tryit()
{
  s = "part 1 & part 2 \\& part 3 \\& part 4 & part 5";
  r = /([^\\]&)/g;
  while(a = r.exec(s))
    alert(a[0] + "\n" + a[1]);
}

The "alert"s will show both "&"s, no "\&" however...


Yours
P

-- 
Yours
P
0
Reply PatD 8/27/2004 8:41:49 AM

MENTAT wrote:

> I [...] Using regular expressions, is it possible to
> split a string based on some special character (lets say &) but leave
> the special character in if its preceeded by a \. so the string "this
> is not split \& from this. & but this is" would produce two matches.

A workaround:

var s = "one&two\\&three&four";
alert(s.replace(/\\&/g, "\\").split("&")); // ["one", "two\\three", "four"]

You only have to replace the '\' within a component.  You
may use any other character to "escape" the '\&' sequence.


HTH

PointedEars
-- 
It doesn't work. I don't know why
0
Reply Thomas 8/29/2004 4:35:52 PM

> Will it?
> Hm..., have you tried to see what [^\\]& actually finds?
> 
> 
> function tryit()
> {
>   s = "part 1 & part 2 \\& part 3 \\& part 4 & part 5";
>   r = /([^\\]&)/g;
>   while(a = r.exec(s))
>     alert(a[0] + "\n" + a[1]);
> }
> 
> The "alert"s will show both "&"s, no "\&" however...
> 
> 
> Yours
> P

You are right pat, I stand corrected. This sort of works :
Data = "this is string 1 & this is string 1 \\& string 2 & this is
string 3"

StringElementsArray=new Array();
EscapedString;      
                  
                  
StringElementsExpr = new RegExp("(.*?)[^\\\\]&","g");//regular
expression
StringElements= StringElementsExpr.exec(Data);
                        		
while(StringElements)
{
EscapedString=StringElements[1].replace(/(\\&)/g,"&");
StringElementsArray.push(EscapedString);
StringElements= StringElementsExpr.exec(Data);
}

The only problem is that it doesn't match the last string ("this is
string 3") as it is not followed by an ampersand. I tried using
"(.*?)[^\\\\]&|(.*?)$" and some other combinations of it to no avail.

Any ideas on how i can get the last match before end of input, or if
the string contains no ampersands what so ever. eg: "this is string 1"
should return just one string. now it will give a no match.
0
Reply chaitanyag 8/29/2004 11:22:52 PM

In a land long ago, in a time far away
chaitanyag@hotmail.com (MENTAT) wrote:

>> Will it?
>> Hm..., have you tried to see what [^\\]& actually finds?
[cut]
>
>You are right pat, I stand corrected. This sort of works :
>Data = "this is string 1 & this is string 1 \\& string 2 & this is
>string 3"
>
[cut: sample code]
>The only problem is that it doesn't match the last string ("this is
>string 3") as it is not followed by an ampersand. I tried using
>"(.*?)[^\\\\]&|(.*?)$" and some other combinations of it to no avail.
>
>Any ideas on how i can get the last match before end of input, or if
>the string contains no ampersands what so ever. eg: "this is string 1"
>should return just one string. now it will give a no match.

Well, "split" supports regular expressions...

Example:
line = "Beginning & part 2 \\& still part 2 & the end";
result = line.split(/[^\\]&/);

The "result" array will have 3 entries:
"Beginning",
"part 2 \\& still part 2" and
"the end"

If there is no match, you'll get the original input string.


Yours
P

0
Reply PatD 8/30/2004 7:48:43 AM

PatD <dostert@pt.lu> wrote in message news:<lkm5j0hnsb7coeqatdvqhjdc0g5vptfrtu@4ax.com>...
> In a land long ago, in a time far away
> chaitanyag@hotmail.com (MENTAT) wrote:
> 
> >> Will it?
> >> Hm..., have you tried to see what [^\\]& actually finds?
>  [cut]
> >
> >You are right pat, I stand corrected. This sort of works :
> >Data = "this is string 1 & this is string 1 \\& string 2 & this is
> >string 3"
> >
>  [cut: sample code]
> >The only problem is that it doesn't match the last string ("this is
> >string 3") as it is not followed by an ampersand. I tried using
> >"(.*?)[^\\\\]&|(.*?)$" and some other combinations of it to no avail.
> >
> >Any ideas on how i can get the last match before end of input, or if
> >the string contains no ampersands what so ever. eg: "this is string 1"
> >should return just one string. now it will give a no match.
> 
> Well, "split" supports regular expressions...
> 
> Example:
> line = "Beginning & part 2 \\& still part 2 & the end";
> result = line.split(/[^\\]&/);
> 
> The "result" array will have 3 entries:
> "Beginning",
> "part 2 \\& still part 2" and
> "the end"
> 
> If there is no match, you'll get the original input string.
> 
> 
> Yours
> P


Sweet. Thats exactly what I was looking for.

cheers dude...
0
Reply chaitanyag 8/31/2004 12:33:16 AM

> Well, "split" supports regular expressions...
> 
> Example:
> line = "Beginning & part 2 \\& still part 2 & the end";
> result = line.split(/[^\\]&/);
> 
> The "result" array will have 3 entries:
> "Beginning",
> "part 2 \\& still part 2" and
> "the end"
> 
> If there is no match, you'll get the original input string.
> 
> 
> Yours
> P

just found a small bug. Anything before the & is lost. thats because
its matched with the [^\\]. so "string 1&string 2&" gives just "string
" and "string ". I tried using Data.split(/(?:[^\\])&/), but it
surprisingly doesn't work. it still captures the character before the
ampersand. Any ideas?
0
Reply chaitanyag 8/31/2004 11:23:29 PM

MENTAT wrote:

> > Well, "split" supports regular expressions...
> >
> > Example:
> > line = "Beginning & part 2 \\& still part 2 & the end";
> > result = line.split(/[^\\]&/);
> >
> > The "result" array will have 3 entries:
> > "Beginning",
> > "part 2 \\& still part 2" and
> > "the end"
> >
> > If there is no match, you'll get the original input string.
> >
> >
> > Yours
> > P
>
> just found a small bug. Anything before the & is lost. thats because
> its matched with the [^\\]. so "string 1&string 2&" gives just "string
> " and "string ". I tried using Data.split(/(?:[^\\])&/), but it
> surprisingly doesn't work. it still captures the character before the
> ampersand. Any ideas?

<script type="text/javascript">
function splitString(el) {
 var s = el.value;
 var token = '[';
 var ts = String((new Date()).getTime());
 for (var i = 0; i < ts.length; i++) {
     token += String.fromCharCode(+ts.charAt(i) + 65);
 }
 token += ']';
 // or simply
 // var token = '\uffff';
 var a = s.replace(/([^\x5c])&/g, '$1' + token).split(token);
 alert(a);
}
</script>
<form>
<input type="text" name="myInput" value="a\&b&c\&d">
<input type="button" value="Test"
onclick="splitString(this.form.elements['myInput']);">
</form>

I build a complex token in the hopes of generating a character sequence
which is unlikely to ever appear elsewhere in your string, there may be a
better way to do that. The other alternative is to use a token like
unicode 65535 (token = '\uffff';) which is unlikely to ever appear
anywhere in your string.

As written, it works in IE6SP1, Gecko-based browsers, Netscape 4.78 and
Opera 7.54. Opera 6.05 drops the &'s in the alert(), but that might be
artifact of Windows & handling, I didn't bother finding out. If you choose
to use a unicode token of \uffff it will stop working in Netscape 4.78.

I used a form because to provide a working example using a literal string,
I would have had to escape the backslashes, and that might have confused
the issue as to whether it works or not.

--
Grant Wagner <gwagner@agricoreunited.com>
comp.lang.javascript FAQ - http://jibbering.com/faq

0
Reply Grant 9/1/2004 2:48:45 PM

In a land long ago, in a time far away
Grant Wagner <gwagner@agricoreunited.com> wrote:

>MENTAT wrote:
>
>> > line = "Beginning & part 2 \\& still part 2 & the end";
>> > result = line.split(/[^\\]&/);
>>
>> just found a small bug. Anything before the & is lost. thats because
>> its matched with the [^\\].
>
><script type="text/javascript">
>function splitString(el) {
> var s = el.value;
> var token = '[';
> var ts = String((new Date()).getTime());
> for (var i = 0; i < ts.length; i++) {
>     token += String.fromCharCode(+ts.charAt(i) + 65);
> }
> token += ']';
> // or simply
> // var token = '\uffff';
> var a = s.replace(/([^\x5c])&/g, '$1' + token).split(token);
> alert(a);
>}
></script>
><form>
><input type="text" name="myInput" value="a\&b&c\&d">
><input type="button" value="Test"
>onclick="splitString(this.form.elements['myInput']);">
></form>
>
>I build a complex token in the hopes of generating a character sequence
>which is unlikely to ever appear elsewhere in your string, there may be a
>better way to do that. The other alternative is to use a token like
>unicode 65535 (token = '\uffff';) which is unlikely to ever appear
>anywhere in your string.


Interesting.

However, for the given problem:

line = "Beginning A&A part 2 B\\&B still part 2 C&C the end";
result = line.replace(/([^\\])&/g,"$1$1&").split(/[^\\]&/);


Seems ok to me... YMMV


Yours
P

-- 
Yours
P
0
Reply PatD 9/3/2004 12:38:29 PM

12 Replies
600 Views

(page loaded in 0.112 seconds)

Similiar Articles:


















7/23/2012 12:00:30 PM


Reply: