f



XML DOM: XML/XHTML inside a text node

In my program, I get input from the user and insert it into an XHTML
document.  Sometimes, this input will contain XHTML, but since I'm
inserting it as a text node, xml.dom.minidom escapes the angle brackets
('<' becomes '&lt;', '>' becomes '&gt;').  I want to be able to
override this behavior cleanly.  I know I could pipe the input through
a SAX parser and create nodes to insert into the tree, but that seems
kind of messy.  Is there a better way?

Thanks.

0
noahlt (3)
11/3/2005 5:05:55 AM
comp.lang.python 77058 articles. 6 followers. Post Follow

5 Replies
1913 Views

Similar Articles

[PageSpeed] 2

On Thu, 2 Nov 2005 noahlt@gmail.com wrote:

> In my program, I get input from the user and insert it into an XHTML
> document.  Sometimes, this input will contain XHTML, but since I'm
> inserting it as a text node, xml.dom.minidom escapes the angle brackets
> ('<' becomes '&lt;', '>' becomes '&gt;').  I want to be able to
> override this behavior cleanly.  I know I could pipe the input through
> a SAX parser and create nodes to insert into the tree, but that seems
> kind of messy.  Is there a better way?

What about parsing the input into XML first? Is there any point in including
unescaped code into XML document unless it is XML itself?


> Thanks.
>
>

Sincerely yours, Roman Suzi
-- 
rnd@onego.ru =\= My AI powered by GNU/Linux RedHat 7.3
0
rnd7053 (93)
11/3/2005 6:07:02 AM
Roman Suzi wrote:
>
> On Thu, 2 Nov 2005 noa...@gmail.com wrote:
> > Is there a better way?
>
> What about parsing the input into XML first? Is there any point in including
> unescaped code into XML document unless it is XML itself?

Indeed. My approach would be to parse the user's input using the
parseString function, to get the "root element" of this newly parsed
document, and then to import that element into the existing document
using importNode. The imported document information could then be
inserted into the existing document at a suitable place. For example:

user_doc = xml.dom.minidom.parseString(user_input)
# NOTE: Check for element or use XPath here...
user_doc_root = user_doc.childNodes[0]
imported_root = existing_doc.importNode(user_doc_root, 1)
existing_doc_location.appendChild(imported_root)

Paul

0
paul338 (1044)
11/3/2005 1:16:14 PM
noahlt@gmail.com wrote:
> In my program, I get input from the user and insert it into an XHTML
> document.  Sometimes, this input will contain XHTML, but since I'm
> inserting it as a text node, xml.dom.minidom escapes the angle brackets
> ('<' becomes '&lt;', '>' becomes '&gt;').  I want to be able to
> override this behavior cleanly.  I know I could pipe the input through
> a SAX parser and create nodes to insert into the tree, but that seems
> kind of messy.  Is there a better way?

You could try version 2.13 of XIST (http://www.livinglogic.de/Python/xist)

Code looks like this:

from ll.xist.ns import html, specials

text = "Number 1 ... the <b>larch</b>"

e = html.div(
    html.h1("And now for something completely different"),
    html.p(specials.literal(text))
)
print e.asBytes()


This prints:
<div><h1>And now for something completely different</h1><p>Number 1 ... 
the <b>larch</b></p></div>

I hope this is what you need.

Bye,
    Walter D�rwald
0
walter1911 (71)
11/4/2005 1:51:10 PM
"""
In my program, I get input from the user and insert it into an XHTML
document.  Sometimes, this input will contain XHTML, but since I'm
inserting it as a text node, xml.dom.minidom escapes the angle brackets
('<' becomes '&lt;', '>' becomes '&gt;').  I want to be able to
override this behavior cleanly.  I know I could pipe the input through
a SAX parser and create nodes to insert into the tree, but that seems
kind of messy.  Is there a better way?
"""

Amara 1.1.6 supports inserting an XML fragment into a document or
element object.  Many short examples here:

http://copia.ogbuji.net/blog/2005-09-21/Dare_s_XLI

excerpt:

Adding a <phone> element as a child of the <contact> element'

    contacts.xml_append_fragment('<phone>%s</phone>'%'206-555-0168'

http://uche.ogbuji.net/tech/4suite/amara

--
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

0
11/4/2005 4:37:03 PM
[noahlt@gmail.com]
> In my program, I get input from the user and insert it into an XHTML
> document.  Sometimes, this input will contain XHTML, but since I'm
> inserting it as a text node, xml.dom.minidom escapes the angle brackets
> ('<' becomes '&lt;', '>' becomes '&gt;').  I want to be able to
> override this behavior cleanly.

Why?

You need to make a decision on how the contained xhtml is treated after 
it has been inserted into the document.

1. If it is simply textual payload, then it should be perfectly 
acceptable to escape those characters. Or you could include it as a 
CDATA section.

2. If it needs to become a structural part of the xml document, i.e. the 
elements are structurally incorporated into the document, then you need 
to transform it into nodes somehow, e.g. by parsing it with sax, etc. 
Although it would probably be easier to parse it into a separate DOM and 
import the generated root node into your document.

Is this xhtml coming from a trusted source? Or are you accepting it from 
strangers, over the internet? If the latter, there are security concerns 
relating to XSS attacks that you need to be aware of.

See the following archive post for how to clean up untrusted (x)html.

http://groups.google.com/group/comp.lang.python/browse_thread/thread/fbdc7ae20353a36d/91b6510990a25f9a

HTH,

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan
0
alanmk (319)
11/4/2005 7:45:07 PM
Reply: