|
|
Stupid regex problem, s/// catching extra letter
I know better than to work late at night, but sometimes it just can't be helped :-)
I'm doing a simple s///, converting "www." to "http://www." when "www." occurs without a preceding "http://". Here's what I'm doing:
$text = "www.example.com";
$text =~ s#[^(http://)]www\.#http://www\.#gi;
print $text;
If $text is this, though:
$text = "<div>www.example.com</div>";
the regex is catching the > in <div>, printing:
<divhttp://www.example.com</div>
Where am I screwing up?
|
|
0
|
|
|
|
Reply
|
jwcarlton (271)
|
7/18/2012 4:01:58 AM |
|
Am 18.07.2012 06:01, schrieb Jason C:
> I know better than to work late at night, but sometimes it just can't be helped :-)
>
> I'm doing a simple s///, converting "www." to "http://www."
> when "www." occurs without a preceding "http://". Here's what I'm doing:
>
> $text = "www.example.com";
> $text =~ s#[^(http://)]www\.#http://www\.#gi;
> print $text;
>
> If $text is this, though:
>
> $text = "<div>www.example.com</div>";
>
> the regex is catching the > in <div>, printing:
>
> <divhttp://www.example.com</div>
>
> Where am I screwing up?
You don't want to use a character class (square brackets).
[^(http://)] tells perl to look for any character not listed
inside the square brackets after the negation (^), so this
might as well read [^)(/:hpt].
What you're trying to do is a zero width negative look-behind
assertion.
s#(?<!http://)www\.#http://www.#gi should do the trick.
The "(?<!...)" tells the regex engine to only match the following
pattern if it is not preceded by the pattern in the look-behind,
without capturing anything.
"perldoc perlre" has good explanations for character classes
and look-around assertions.
-Chris
|
|
0
|
|
|
|
Reply
|
thepoet_nospam (43)
|
7/18/2012 4:57:00 AM
|
|
On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
> What you're trying to do is a zero width negative look-behind
> assertion.
> s#(?<!http://)www\.#http://www.#gi should do the trick.
> The "(?<!...)" tells the regex engine to only match the following
> pattern if it is not preceded by the pattern in the look-behind,
> without capturing anything.
>
> "perldoc perlre" has good explanations for character classes
> and look-around assertions.
>
> -Chris
Thanks for the help, Chris. Character classes aren't exactly intuitive when a symbol changes definition completely based on context, so I'm still struggling with that a little.
The modification you suggested was perfect, though! Thanks again :-)
|
|
0
|
|
|
|
Reply
|
jwcarlton (271)
|
7/18/2012 5:05:20 AM
|
|
Jason C <jwcarlton@gmail.com> writes:
> On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
>> What you're trying to do is a zero width negative look-behind
>> assertion.
>> s#(?<!http://)www\.#http://www.#gi should do the trick.
>> The "(?<!...)" tells the regex engine to only match the following
>> pattern if it is not preceded by the pattern in the look-behind,
>> without capturing anything.
>>
>> "perldoc perlre" has good explanations for character classes
>> and look-around assertions.
>>
>> -Chris
>
> Thanks for the help, Chris. Character classes aren't exactly
> intuitive when a symbol changes definition completely based on
> context, so I'm still struggling with that a little.
A character class denotes an unordered set of characters, meaning
[^http://]
[^htp:/]
[^:pppppth/]
[^:/hpt]
[^h:t/p]
all represent identical sets and they all match a single character.
But you wanted to match the string http:// and a regex matching a
string is just the string itself, IOW, THIS sequence of characters.
|
|
0
|
|
|
|
Reply
|
rweikusat (2680)
|
7/18/2012 12:30:56 PM
|
|
|
3 Replies
54 Views
(page loaded in 0.224 seconds)
|
|
|
|
|
|
|
|
|