f



& and &

How can one stop a browser from converting 

& 

to 

& ?

We have a textarea in our system wehre a user can type in some html code 
and have it saved to the database. When the data is retireved and 
redisplayed it is displayed as simply &.

HTML snippet:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
fred
&
&amp;
</TEXTAREA>

When displayed, the user predictably sees 

fred
&
&

What workarounds are there for this - I am sure it's a problem for 
others - is there a way of "escaping" the value before display?


-- 

jeremy

0
Jeremy
3/3/2006 12:43:12 PM
comp.authoring.html 7078 articles. 0 followers. Post Follow

11 Replies
1469 Views

Similar Articles

[PageSpeed] 31

Jeremy wrote:

> We have a textarea in our system wehre a user can type in some html code
> and have it saved to the database.

Users can't type "HTML code" into a <textarea>   What they type _is_
plain text, which they might _intend_ to have interpreted later as if
it were HTML.  To help them do this you must first convert their plain
text as entered into HTML - part of this process would be to encode
their plaintext "&" into the HTML "&amp;", probably just before storing
it.

Good variable naming in your server code will help too - try prefixing
variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
you see code that assigns variables with mis-matched names, be
suspicious that there's an encoding / decoding process missing.

This stuff isn't hard to do, but it does require clarity of thought and
attention to detail. It's also very important to get right (there are
some interesting attacks you can make on blogs etc. if you let users
post arbitrary chunks of HTML).

0
Andy
3/3/2006 12:56:00 PM
Jeremy <jeremy0505@gmail.com> wrote:

>How can one stop a browser from converting 
>
>&amp; 
>
>to 
>
>& ?

You can't if the document is served as text/html.

If you want a browser to display &amp; literally in a document served as
text/html use &amp;amp;

Depending on what you need it may be possible to serve it as text/plain
in which case you can use the literal. This can also be embedded into a
document served as text/html.

>We have a textarea in our system wehre a user can type in some html code 
>and have it saved to the database. When the data is retireved and 
>redisplayed it is displayed as simply &.
>
>HTML snippet:
>
><TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
>fred
>&
>&amp;
></TEXTAREA>
>
>When displayed, the user predictably sees 
>
>fred
>&
>&

You haven't made it clear what this is used for, but maybe your server
side data processing needs to convert character references (not just
&amp;) that the user enters to &amp;char_ref

-- 
Spartanicus
0
Spartanicus
3/3/2006 12:56:23 PM
In article <1141390560.714557.79150@p10g2000cwp.googlegroups.com>, Andy 
Dingley says...
> Users can't type "HTML code" into a <textarea>   What they type _is_
> plain text, which they might _intend_ to have interpreted later as if
> it were HTML.  To help them do this you must first convert their plain
> text as entered into HTML - part of this process would be to encode
> their plaintext "&" into the HTML "&amp;", probably just before storing
> it.
> 
> Good variable naming in your server code will help too - try prefixing
> variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
> you see code that assigns variables with mis-matched names, be
> suspicious that there's an encoding / decoding process missing.
> 
> This stuff isn't hard to do, but it does require clarity of thought and
> attention to detail. It's also very important to get right (there are
> some interesting attacks you can make on blogs etc. if you let users
> post arbitrary chunks of HTML).
> 

Yep I understand all of that. The user types &amp; into a field and 
submits the form. The &amp; is stored in the database as typed by the 
user. When the data is redisplayed for editing, the browser changes the 
&amp; to simply & 

So it really has nothing to do with variable naming and so on - the 
question is how can we present back to the user the data that they 
entered into the field?

-- 

jeremy

0
Jeremy
3/3/2006 1:04:13 PM
On Fri, 3 Mar 2006, Andy Dingley wrote:

> Jeremy wrote:
> 
> > We have a textarea in our system wehre a user can type in some 
> > html code and have it saved to the database.
> 
> Users can't type "HTML code" into a <textarea> 

I don't see for a moment why not.  In fact I've been doing it for 
ages.  (Of course I would term it "markup", not "code").

And see http://www.htmlhelp.com/tools/validator/direct.html.en
for a practical use of such a thing.

> What they type _is_ plain text,

What they type is text.  Whether it's plain or otherwise is determined 
by what the server-side process is going to use it for.  There's no 
way to control this: whatever they type-in, be it plain text, HTML 
markup, C++ code, raw PostScript, or Linear B, gets submitted to the 
server-side in accordance with the rules for forms submission.  HTML 
markup plays no special role in this part of the action - but it's not 
for a moment excluded.

It's all about what you *do* with it when it reaches the server side.

> which they might _intend_ to have interpreted later as if
> it were HTML.  To help them do this you must first convert their plain
> text as entered into HTML - part of this process would be to encode
> their plaintext "&" into the HTML "&amp;", probably just before storing
> it.

*That* would certainly not be helpful if they were supplying HTML 
markup.

> This stuff isn't hard to do, but it does require clarity of thought 

That's very true.

> It's also very important to get right (there are some interesting 
> attacks you can make on blogs etc. if you let users post arbitrary 
> chunks of HTML).

Indeed; so block the raw-HTML options to untrusted contributors. But 
that doesn't mean there's anything wrong in principle with the 
existence of a raw-HTML option.
0
Alan
3/3/2006 1:14:25 PM
Jeremy wrote:
> How can one stop a browser from converting 
> 
> &amp; 
> 
> to 
> 
> & ?
> 
> We have a textarea in our system wehre a user can type in some html code 
> and have it saved to the database. When the data is retireved and 
> redisplayed it is displayed as simply &.
> 
> HTML snippet:
> 
> <TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
> fred
> &
> &amp;
> </TEXTAREA>
> 
> When displayed, the user predictably sees 
> 
> fred
> &
> &

Easy--convert all the & to &amp; before displaying them. "&" will become 
&amp; and will display as "&", and "&amp;" will become "&amp;amp;" and 
will display as "&amp;".
0
Harlan
3/3/2006 1:17:48 PM
In article <46qu0kFcc73kU1@individual.net>, Harlan Messinger says...
>
> 
> Easy--convert all the & to &amp; before displaying them. "&" will become 
> &amp; and will display as "&", and "&amp;" will become "&amp;amp;" and 
> will display as "&amp;".
> 

Brilliant - obvious but brilliant - thanks that is all I needed.

-- 

jeremy
0
Jeremy
3/3/2006 2:10:22 PM
Jeremy <jeremy0505@gmail.com> wrote:

>In article <46qu0kFcc73kU1@individual.net>, Harlan Messinger says...
>>
>> 
>> Easy--convert all the & to &amp; before displaying them. "&" will become 
>> &amp; and will display as "&", and "&amp;" will become "&amp;amp;" and 
>> will display as "&amp;".
>> 
>
>Brilliant - obvious but brilliant - thanks that is all I needed.

If you're working in PHP see htmlentities() it'll get them all in one
swell foop and it has the speed advantage of being a builtin.

-- 
http://www.ren-prod-inc.com/hug_soft/store.php?action=contact
0
hug
3/3/2006 2:40:26 PM
In article <s8lg02hmt6p9dgdpgbemol4jisgo2jhd6b@4ax.com>, hug says...
> >Brilliant - obvious but brilliant - thanks that is all I needed.
> 
> If you're working in PHP see htmlentities() it'll get them all in one
> swell foop and it has the speed advantage of being a builtin.
> 

Thanks - actually working in Oracle pl/sql - 'tis a simple replace() 
call.

-- 

jeremy
0
Jeremy
3/3/2006 3:06:08 PM
Jeremy wrote:
> In article <s8lg02hmt6p9dgdpgbemol4jisgo2jhd6b@4ax.com>, hug says...
> 
>>>Brilliant - obvious but brilliant - thanks that is all I needed.
>>
>>If you're working in PHP see htmlentities() it'll get them all in one
>>swell foop and it has the speed advantage of being a builtin.
>>
> 
> 
> Thanks - actually working in Oracle pl/sql - 'tis a simple replace() 
> call.

Hug makes a good general point--you might need to make other conversions 
too, like < and > and possibly quotation marks. Server-side applications 
generally have access in some way to an HTMLEncode function that handles 
all of that.
0
Harlan
3/3/2006 5:56:50 PM
Jeremy wrote:
> In article <1141390560.714557.79150@p10g2000cwp.googlegroups.com>, Andy 
> Dingley says...
>> Good variable naming in your server code will help too - try prefixing
>> variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
>> you see code that assigns variables with mis-matched names, be
>> suspicious that there's an encoding / decoding process missing.
> 
> So it really has nothing to do with variable naming and so on

See this article from Joel on Software to get a better idea of what Andy 
was talking about.
http://www.joelonsoftware.com/articles/Wrong.html

In short, you're receiving unsafe content from the user, expecting it to 
be plain text, failing to process it to make it is safe by encoding it 
in HTML syntax and then outputting it directly.  I suspect this will 
probably be another bug if the user happens to enter this:

   Hello World!</textarea>
   <script>//do something evil</script>

When you include that fragment within your document and have not 
processed it, the markup recieved by the browser would look something 
like this:

<TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
   Hello World!</textarea>
   <script>//do something evil</script>
</TEXTAREA>

Also, I'm not sure what that align="virtual" attribute in your markup is 
supposed to do, I've never heard of it before.  Neither align nor wrap 
are valid attributes of the textarea element.

-- 
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/     Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
0
Lachlan
3/7/2006 12:58:39 PM
In article <3sfPf.556$dy4.374@news-server.bigpond.net.au>, Lachlan Hunt 
says...
> Jeremy wrote:
> > In article <1141390560.714557.79150@p10g2000cwp.googlegroups.com>, Andy 
> > Dingley says...
> >> Good variable naming in your server code will help too - try prefixing
> >> variable with "strUserJunk", or "htmlUserJunk" as appropriate. Whenever
> >> you see code that assigns variables with mis-matched names, be
> >> suspicious that there's an encoding / decoding process missing.
> > 
> > So it really has nothing to do with variable naming and so on
> 
> See this article from Joel on Software to get a better idea of what Andy 
> was talking about.
> http://www.joelonsoftware.com/articles/Wrong.html
> 
> In short, you're receiving unsafe content from the user, expecting it to 
> be plain text, failing to process it to make it is safe by encoding it 
> in HTML syntax and then outputting it directly.  I suspect this will 
> probably be another bug if the user happens to enter this:
> 
>    Hello World!</textarea>
>    <script>//do something evil</script>
> 
> When you include that fragment within your document and have not 
> processed it, the markup recieved by the browser would look something 
> like this:
> 
> <TEXTAREA NAME="p_html" ROWS=6 COLS=70 ALIGN="VIRTUAL" WRAP="YES">
>    Hello World!</textarea>
>    <script>//do something evil</script>
> </TEXTAREA>
> 
> Also, I'm not sure what that align="virtual" attribute in your markup is 
> supposed to do, I've never heard of it before.  Neither align nor wrap 
> are valid attributes of the textarea element.
> 

Thanks for all your feedback on this. Andy was addressing another issue 
- I guess something related t owhat I was asking about. I see and 
understand the point about potentially unsafe content. This is part of 
an administrative toolset used by experienced and responsible site 
administrators.

-- 

jeremy
0
Jeremy
3/8/2006 11:20:53 AM
Reply: