f



Swapping '<' and '>' when text is not HTML

Hi, can anyone help me devise some code which will swap  '<' and '>'
characters for &lt; and &gt; when the string is not HTML

So if I had the following string.

var lineoftext="Hello, my name is <strong>Graham</strong> and I am >
30 years old"

I need some code which will turn it into

var lineoftext="Hello, my name is <strong>Graham</strong> and I am
&gt; 30 years old"

I'm having trouble finding a complete solution. Does anyone have one
they have written already?

Thanks.

Graham
0
Laser
10/1/2010 10:10:28 AM
comp.lang.javascript 38370 articles. 2 followers. javascript4 (1315) is leader. Post Follow

4 Replies
543 Views

Similar Articles

[PageSpeed] 30

Le 01/10/10 12:10, Laser Lips a �crit :
> var lineoftext="Hello, my name is<strong>Graham</strong>  and I am>
> 30 years old"

no proplem with Firefox and :

javascript:
var lineoftext="Hello, my name is <strong>Graham</strong> and I am > 30 
years old";
alert(lineoftext);

hop !  Hello, my name is <strong>Graham</strong> and I am > 30 years old

javascript:
var lineoftext="Hello, my name is <strong>Graham</strong> and I am > 30 
years old";
document.write(lineoftext);
document.close();

hop!   Hello, my name is *Graham* and I am > 30 years old


What's the trouble and where do you find it?`


javascript:
var lineoftext="Hello, my name is <strong>Graham</strong> and I am > 30 
years old";
document.write('<pre>'+lineoftext+'</pre>');
document.close();

hop!  Hello, my name is *Graham* and I am > 30 years old


javascript:
var lineoftext="Hello, my name is &lt;strong>Graham&lt;/strong> and I am 
 > 30 years old";
document.write('<pre>'+lineoftext+'</pre>');
document.close();

hop!  Hello, my name is <strong>Graham</strong> and I am > 30 years old


-- 
St�phane Moriaux avec/with iMac-intel
0
SAM
10/1/2010 11:45:51 AM
On 01/10/10 12:10, Laser Lips wrote:
> Hi, can anyone help me devise some code which will swap  '<' and '>'
> characters for &lt; and &gt; when the string is not HTML
> 
> So if I had the following string.
> 
> var lineoftext="Hello, my name is <strong>Graham</strong> and I am >
> 30 years old"
> 
> I need some code which will turn it into
> 
> var lineoftext="Hello, my name is <strong>Graham</strong> and I am
> &gt; 30 years old"
> 
> I'm having trouble finding a complete solution. Does anyone have one
> they have written already?

I doubt it, that's not exactly something people do all the time.

   Standard disclaimer: don't try to parse HTML with regular
   expressions, unless you know exactly what the input string can
   contain. It is usually better to use a proper HTML parser.

In a reduced case like your example, it shouldn't be too hard, as long
as you can define which < and > characters aren't part of the markup.
This would be a lot easier if your input text had them properly escaped
as entities ("I am &lt; 30 years old"). My suggestion is to run your
text through a tool like Tidy, and then replace all the &gt; entities
with &lt; and vice versa:

  var str = "My name is <b>Mud</b>, and I'm &gt; 30 years old.";
  str = str.replace(/&gt;/g, "TOKEN")
           .replace(/&lt;/g, "&gt;")
           .replace(/TOKEN/g, "&lt;");

Where TOKEN is a string which does not appear in the input text.

If you're unable or unwilling to clean up the input text first, you'll
need to define for yourself what is text and what is HTML markup. This
might work in your specific case, but you're not really handling all of
HTML here:

  var str = "My name is <b>Mud</b>, and I'm &gt; 30 years old.";
  str = str.replace(/<(?![a-z\/!])/gi, "&lt;")
           .replace(/([^a-z\/"'-])>/gi, "$1&gt;")
           .replace(/&lt;/g, "&gt;")
           .replace(/TOKEN/g, "&lt;");

The first two replace() calls try to escape < and > characters which
look like they are not at the beginning or the end of an HTML tag or
comment. It's a poor man's substitute for what the HTML author (or Tidy)
should be doing. This may or may not work for you. I'm also assuming
that your input does not contain any other < or > characters which
should be left intact, for example in <script> or <style> elements, or
in comments.

If you want to replace these characters in the same HTML document where
your script is running, you'll want to walk through the DOM tree and
selectively replace the contents of text nodes. This has the advantage
of using a real HTML parser, but it requires some understanding of the
DOM and its methods.


-- 
stefan
0
Stefan
10/1/2010 1:36:30 PM
Thanks for replies guys....I wrote it in the end...

Graham

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
 <HEAD>
  <TITLE> New Document </TITLE>
  <META NAME=3D"Generator" CONTENT=3D"EditPlus">
  <META NAME=3D"Author" CONTENT=3D"">
  <META NAME=3D"Keywords" CONTENT=3D"">
  <META NAME=3D"Description" CONTENT=3D"">
 </HEAD>

 <BODY>
  <script>

var htmlArray=3D[
,"a","abbr","acronym","address","applet","area","b","base","basefont","bdo"=
,"big","blockquote","body","br","button","caption","center","cite","code","=
col","colgroup","dd","del","dfn","dir","div","dl","dt","em","fieldset","fon=
t","form","frame","frameset","head","h1","h2","h3","h4","h5","h6","hr","htm=
l","i","iframe","img","input","ins","kbd","label","legend","li","link","map=
","menu","meta","noframes","noscript","object","ol","optgroup","option","p"=
,"param","pre","q","s","samp","script","select","small","span","strike","st=
rong","style","sub","sup","table","tbody","td","textarea","tfoot","th","the=
ad","tr","tt","u","ul"];
function swapOutChevrons(words)
{
	var run=3Dtrue;
	var indL=3D0;
	var indR=3D0;
	var temp=3Dwords;
	var preTemp;
	var tag=3D"";
	var isHTMLTag=3Dfalse;
	var ret=3D"";

	times=3D0;
	while(run)
	{
		//alert("temp=3D["+temp+"]");
		indL=3Dtemp.indexOf("<");
		indR=3Dtemp.indexOf(">");

		if(indL>-1 && indR=3D=3D-1)
		{
			//just contains a '<'
			ret+=3Dtemp.substring(0,indL)+"&lt;";
			temp=3Dtemp.substring(indL+1);
			continue;
		}
		preTemp=3Dtemp.substring(0,indL);
		tag=3Dtemp.substring(indL+1,indR).toLowerCase();
		tag=3DcleanTag(tag);
		if(checkHTMLTag(tag)=3D=3Dtrue)
		{
			ret+=3DpreTemp+temp.substring(indL,indR+1);
			temp=3Dtemp.substring(indR+1);
		}else{

			if(indL=3D=3D-1)indL=3D1.7976931348623157E+10308;
			if(indR=3D=3D-1)indR=3D1.7976931348623157E+10308;

			if(indR<indL)
			{
				ret+=3Dtemp.substring(0,indR)+"&gt;";
				temp=3Dtemp.substring(indR+1);
			}else if(indL<indR)
			{
				ret+=3Dtemp.substring(0,indL)+"&lt;";
				temp=3Dtemp.substring(indL+1);
			}else{
				//
			}
		}

		//check for stoppage
		indL=3Dtemp.indexOf("<");
		indR=3Dtemp.indexOf(">");
		if(indL=3D=3D-1 && indR=3D=3D-1)
		{
			ret+=3Dtemp.substring(0);
			run=3Dfalse;
		}
	}
	return ret;
}

function cleanTag(tag)
{
	if(tag.charAt(0)=3D=3D"/")
	{
		tag=3Dtag.substring(1);
	}
	var indSpc=3Dtag.indexOf(" ");
	var indEnd=3Dtag.indexOf(">");
	var indFSL=3Dtag.indexOf("/");
	var max=3D1.7976931348623157E+10308; //inifinity
	if(indSpc >-1)
	{
		max=3DindSpc;
	}
	if(indEnd >-1 && indEnd < max)
	{
		max=3DindEnd;
	}
	if(indFSL >-1 && indFSL < max)
	{
		max=3DindFSL;
	}
	if(max=3D=3D1.7976931348623157E+10308)max=3Dtag.length;

	tag=3Dtag.substring(0,max);
	return tag;
}
function checkHTMLTag(tag)
{
	for(var x=3D0;x<htmlArray.length;x++)
	{
		if(htmlArray[x]=3D=3Dtag)return true;
	}
	return false;
}


var lineoftext=3D[];
count=3D0;
//lineoftext.push("Date of surgery  < a>");
lineoftext.push("hello  < br /> ss");
lineoftext.push("hello  <<<< br ss");
lineoftext.push("Hello, my name is <strong style=3D'color:red'
this=3D'that'>Graham</strong> and I am >30 >>> years old. <br/>  <br />
OK?");
lineoftext.push("Hello, my name is Graham and I am 30 years old.
OK?");
lineoftext.push("Hello, my name is <strong style=3D'color:red'
this=3D'that'>Graham</strong> and I am < 30 << years old. OK?");
lineoftext.push("Hello, my name is <strong style=3D'color:red'
this=3D'that'>Graham</strong> and I < am > 30 years old. <br/>  <br />
OK?");
lineoftext.push("Hello, my name is <strong style=3D'color:red'
this=3D'that'>Graham</strong> and I > am < 30 years old. <br/>  <br />
OK?");

for(var x=3D0;x<lineoftext.length;x++)
{
	alert(lineoftext[x]+"\n"+swapOutChevrons(lineoftext[x]));
}

  </script>
 </BODY>
</HTML>

0
Laser
10/1/2010 4:30:44 PM
Stefan Weiss wrote:

> Laser Lips wrote:
>> Hi, can anyone help me devise some code which will swap  '<' and '>'
>> characters for &lt; and &gt; when the string is not HTML
>> 
>> So if I had the following string.
>> 
>> var lineoftext="Hello, my name is <strong>Graham</strong> and I am >
>> 30 years old"
>> 
>> I need some code which will turn it into
>> 
>> var lineoftext="Hello, my name is <strong>Graham</strong> and I am
>> &gt; 30 years old"
> 
> I doubt it, that's not exactly something people do all the time.
> 
>    Standard disclaimer: don't try to parse HTML with regular
>    expressions, unless you know exactly what the input string can
>    contain. It is usually better to use a proper HTML parser.

However, as a matter of fact a markup parser can be written using regular 
expressions (BTDT), just not one regular expression alone.
 
> In a reduced case like your example, it shouldn't be too hard, as long
> as you can define which < and > characters aren't part of the markup.
> This would be a lot easier if your input text had them properly escaped
> as entities ("I am &lt; 30 years old"). My suggestion is to run your
> text through a tool like Tidy, and then replace all the &gt; entities
> with &lt; and vice versa:
> 
>   var str = "My name is <b>Mud</b>, and I'm &gt; 30 years old.";
>   str = str.replace(/&gt;/g, "TOKEN")
>            .replace(/&lt;/g, "&gt;")
>            .replace(/TOKEN/g, "&lt;");
> 
> Where TOKEN is a string which does not appear in the input text.
> 
> If you're unable or unwilling to clean up the input text first, you'll
> need to define for yourself what is text and what is HTML markup. This
> might work in your specific case, but you're not really handling all of
> HTML here:
> 
>   var str = "My name is <b>Mud</b>, and I'm &gt; 30 years old.";
>   str = str.replace(/<(?![a-z\/!])/gi, "&lt;")
>            .replace(/([^a-z\/"'-])>/gi, "$1&gt;")
>            .replace(/&lt;/g, "&gt;")
>            .replace(/TOKEN/g, "&lt;");

Or

  str = str.replace(/(<\/?\w+[^>]*>)|[<>]/g, function(m, p1) {
    if (p1)
    {
      return p1;
    }

    return ({
      "<": "&lt;",
      ">": "&gt;"
    })[m];
  });

This is only an example, of course.


PointedEars
-- 
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
  -- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)
0
Thomas
10/1/2010 5:53:08 PM
Reply: