Stupid regex problem, s/// catching extra letter

  • Follow


I know better than to work late at night, but sometimes it just can't be helped :-)

I'm doing a simple s///, converting "www." to "http://www." when "www." occurs without a preceding "http://". Here's what I'm doing:

$text = "www.example.com";
$text =~ s#[^(http://)]www\.#http://www\.#gi;
print $text;

If $text is this, though:

$text = "<div>www.example.com</div>";

the regex is catching the > in <div>, printing:

<divhttp://www.example.com</div>

Where am I screwing up?
0
Reply jwcarlton (271) 7/18/2012 4:01:58 AM

Am 18.07.2012 06:01, schrieb Jason C:
> I know better than to work late at night, but sometimes it just can't be helped :-)
>
> I'm doing a simple s///, converting "www." to "http://www."
 > when "www." occurs without a preceding "http://". Here's what I'm doing:
>
> $text = "www.example.com";
> $text =~ s#[^(http://)]www\.#http://www\.#gi;
> print $text;
>
> If $text is this, though:
>
> $text = "<div>www.example.com</div>";
>
> the regex is catching the > in <div>, printing:
>
> <divhttp://www.example.com</div>
>
> Where am I screwing up?

You don't want to use a character class (square brackets).
[^(http://)] tells perl to look for any character not listed
inside the square brackets after the negation (^), so this
might as well read [^)(/:hpt].

What you're trying to do is a zero width negative look-behind
assertion.
s#(?<!http://)www\.#http://www.#gi should do the trick.
The "(?<!...)" tells the regex engine to only match the following
pattern if it is not preceded by the pattern in the look-behind,
without capturing anything.

"perldoc perlre" has good explanations for character classes
and look-around assertions.

-Chris
0
Reply thepoet_nospam (43) 7/18/2012 4:57:00 AM


On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
> What you're trying to do is a zero width negative look-behind
> assertion.
> s#(?<!http://)www\.#http://www.#gi should do the trick.
> The "(?<!...)" tells the regex engine to only match the following
> pattern if it is not preceded by the pattern in the look-behind,
> without capturing anything.
> 
> "perldoc perlre" has good explanations for character classes
> and look-around assertions.
> 
> -Chris

Thanks for the help, Chris. Character classes aren't exactly intuitive when a symbol changes definition completely based on context, so I'm still struggling with that a little.

The modification you suggested was perfect, though! Thanks again :-)
0
Reply jwcarlton (271) 7/18/2012 5:05:20 AM

Jason C <jwcarlton@gmail.com> writes:
> On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
>> What you're trying to do is a zero width negative look-behind
>> assertion.
>> s#(?<!http://)www\.#http://www.#gi should do the trick.
>> The "(?<!...)" tells the regex engine to only match the following
>> pattern if it is not preceded by the pattern in the look-behind,
>> without capturing anything.
>> 
>> "perldoc perlre" has good explanations for character classes
>> and look-around assertions.
>> 
>> -Chris
>
> Thanks for the help, Chris. Character classes aren't exactly
> intuitive when a symbol changes definition completely based on
> context, so I'm still struggling with that a little.

A character class denotes an unordered set of characters, meaning

[^http://]
[^htp:/]
[^:pppppth/]
[^:/hpt]
[^h:t/p]

all represent identical sets and they all match a single character.
But you wanted to match the string http:// and a regex matching a
string is just the string itself, IOW, THIS sequence of characters.
0
Reply rweikusat (2680) 7/18/2012 12:30:56 PM

3 Replies
54 Views

(page loaded in 0.224 seconds)


Reply: