f



XSLT Extract Text from Nodes

Hello,

I am new to the concept of XSL and am looking for some assistance.

Take the following XML document:

<binder>
  <author>Greg</author>
  <notes>
    <time>11:45</time>
    <content>
      This would be some content... every once in a while you may run
into
      <heading>A Heading!</heading>
      Which could be followed by more content... and possible
      <heading>More Headings.</heading>
      and even more content!
    </content>
  </notes>
</binder>

What I would like to do is to be able to extract the value of the
<content> node, and have special formatting for the headings.

When I do something like:

<xsl:value-of select="content" />

I receive the data within <content> - including the values of the
nested <heading> nodes, but what I really want to be able to do is do
is to have XSLT read the text of the <content> node until a <heading>
node is reached, at which point the value of the heading node is
formatted correctly and displayed, and then continued by the text of
the <content> node after the <heading> until another <heading> is
reached... etc etc...

Could someone give me some pointers as to how this can be accomplished?

0
10/10/2006 5:31:33 PM
comp.text.xml 8781 articles. 0 followers. Post Follow

9 Replies
730 Views

Similar Articles

[PageSpeed] 45


gregmcmullinjr@gmail.com wrote:


>     <content>
>       This would be some content... every once in a while you may run
> into
>       <heading>A Heading!</heading>
>       Which could be followed by more content... and possible
>       <heading>More Headings.</heading>
>       and even more content!
>     </content>


> What I would like to do is to be able to extract the value of the
> <content> node, and have special formatting for the headings.

Use templates and xsl:apply-templates e.g.

   <xsl:template match="content">
     <div>
       <xsl:apply-templates/>
     </div>
   </xsl:template>

   <xsl:template match="heading">
     <h1>
       <xsl:apply-templates/>
     </h1>
   </xsl:template>

There is a built-in template for text nodes
   <http://www.w3.org/TR/xslt#built-in-rule>
so you don't have to do anything for them, they end up in the result 
tree anyway with the above approach.


-- 

	Martin Honnen
	http://JavaScript.FAQTs.com/
0
mahotrash (2052)
10/10/2006 5:37:46 PM
Thanks for your quick reply Martin,

This has brought me closer to what I would like to accomplish, however
I now have the following issue.

I was using the xsl:value-of element with disable-output-escaping="yes"
to produce HTML formatted text in the browser screen.  You see within
the <content> node there may be HTML that should be displayed as such.
Your method produces all of the text in the correct order and formatted
according to tag name, but produces HTML tags which should be hidden.

ie.

<content>
  There may be some <i>italicized</i> text...
  <heading>Maybe even <u>formatting in a heading</u></heading>
  ...
</content>

Is there some way to overcome this?

Martin Honnen wrote:
> gregmcmullinjr@gmail.com wrote:
>
>
> >     <content>
> >       This would be some content... every once in a while you may run
> > into
> >       <heading>A Heading!</heading>
> >       Which could be followed by more content... and possible
> >       <heading>More Headings.</heading>
> >       and even more content!
> >     </content>
>
>
> > What I would like to do is to be able to extract the value of the
> > <content> node, and have special formatting for the headings.
>
> Use templates and xsl:apply-templates e.g.
>
>    <xsl:template match="content">
>      <div>
>        <xsl:apply-templates/>
>      </div>
>    </xsl:template>
>
>    <xsl:template match="heading">
>      <h1>
>        <xsl:apply-templates/>
>      </h1>
>    </xsl:template>
>
> There is a built-in template for text nodes
>    <http://www.w3.org/TR/xslt#built-in-rule>
> so you don't have to do anything for them, they end up in the result
> tree anyway with the above approach.
> 
> 
> -- 
> 
> 	Martin Honnen
> 	http://JavaScript.FAQTs.com/

0
10/10/2006 6:05:50 PM
I should say that the HTML tags within my XML document are stored as
entities (at least the < character is)  i.e.

<content>
  This is some &lt;i>italicized&lt;/i> text...
  ...
</content>

Thanks.


gregmcmulli...@gmail.com wrote:
> Thanks for your quick reply Martin,
>
> This has brought me closer to what I would like to accomplish, however
> I now have the following issue.
>
> I was using the xsl:value-of element with disable-output-escaping="yes"
> to produce HTML formatted text in the browser screen.  You see within
> the <content> node there may be HTML that should be displayed as such.
> Your method produces all of the text in the correct order and formatted
> according to tag name, but produces HTML tags which should be hidden.
>
> ie.
>
> <content>
>   There may be some <i>italicized</i> text...
>   <heading>Maybe even <u>formatting in a heading</u></heading>
>   ...
> </content>
>
> Is there some way to overcome this?
>
> Martin Honnen wrote:
> > gregmcmullinjr@gmail.com wrote:
> >
> >
> > >     <content>
> > >       This would be some content... every once in a while you may run
> > > into
> > >       <heading>A Heading!</heading>
> > >       Which could be followed by more content... and possible
> > >       <heading>More Headings.</heading>
> > >       and even more content!
> > >     </content>
> >
> >
> > > What I would like to do is to be able to extract the value of the
> > > <content> node, and have special formatting for the headings.
> >
> > Use templates and xsl:apply-templates e.g.
> >
> >    <xsl:template match="content">
> >      <div>
> >        <xsl:apply-templates/>
> >      </div>
> >    </xsl:template>
> >
> >    <xsl:template match="heading">
> >      <h1>
> >        <xsl:apply-templates/>
> >      </h1>
> >    </xsl:template>
> >
> > There is a built-in template for text nodes
> >    <http://www.w3.org/TR/xslt#built-in-rule>
> > so you don't have to do anything for them, they end up in the result
> > tree anyway with the above approach.
> >
> > 
> > -- 
> > 
> > 	Martin Honnen
> > 	http://JavaScript.FAQTs.com/

0
10/10/2006 6:10:12 PM
I have found a solution.  The following is the build in template for
text nodes:

<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

It can be overridden simply by creating a new custom template, which I
did as the following:

<xsl:template match="text()|@*">
  <xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>

The result is that the HTML in the text nodes outputs as desired.

gregmcmullinjr@gmail.com wrote:
> I should say that the HTML tags within my XML document are stored as
> entities (at least the < character is)  i.e.
>
> <content>
>   This is some &lt;i>italicized&lt;/i> text...
>   ...
> </content>
>
> Thanks.
>
>
> gregmcmulli...@gmail.com wrote:
> > Thanks for your quick reply Martin,
> >
> > This has brought me closer to what I would like to accomplish, however
> > I now have the following issue.
> >
> > I was using the xsl:value-of element with disable-output-escaping="yes"
> > to produce HTML formatted text in the browser screen.  You see within
> > the <content> node there may be HTML that should be displayed as such.
> > Your method produces all of the text in the correct order and formatted
> > according to tag name, but produces HTML tags which should be hidden.
> >
> > ie.
> >
> > <content>
> >   There may be some <i>italicized</i> text...
> >   <heading>Maybe even <u>formatting in a heading</u></heading>
> >   ...
> > </content>
> >
> > Is there some way to overcome this?
> >
> > Martin Honnen wrote:
> > > gregmcmullinjr@gmail.com wrote:
> > >
> > >
> > > >     <content>
> > > >       This would be some content... every once in a while you may run
> > > > into
> > > >       <heading>A Heading!</heading>
> > > >       Which could be followed by more content... and possible
> > > >       <heading>More Headings.</heading>
> > > >       and even more content!
> > > >     </content>
> > >
> > >
> > > > What I would like to do is to be able to extract the value of the
> > > > <content> node, and have special formatting for the headings.
> > >
> > > Use templates and xsl:apply-templates e.g.
> > >
> > >    <xsl:template match="content">
> > >      <div>
> > >        <xsl:apply-templates/>
> > >      </div>
> > >    </xsl:template>
> > >
> > >    <xsl:template match="heading">
> > >      <h1>
> > >        <xsl:apply-templates/>
> > >      </h1>
> > >    </xsl:template>
> > >
> > > There is a built-in template for text nodes
> > >    <http://www.w3.org/TR/xslt#built-in-rule>
> > > so you don't have to do anything for them, they end up in the result
> > > tree anyway with the above approach.
> > >
> > >
> > > --
> > > 
> > > 	Martin Honnen
> > > 	http://JavaScript.FAQTs.com/

0
10/10/2006 6:49:23 PM
Please don't top-post.

gregmcmullinjr@gmail.com wrote:
> Martin Honnen wrote:
> > gregmcmullinjr@gmail.com wrote:
> > >     <content>
> > >       This would be some content... every once in a
> > >       while you may run into
> > >       <heading>A Heading!</heading>
> > >       Which could be followed by more content... and
> > >       possible
> > >       <heading>More Headings.</heading>
> > >       and even more content!
> > >     </content>
> >
> > Use templates and xsl:apply-templates e.g.
> >
> >    <xsl:template match="content">
> >      <div>
> >        <xsl:apply-templates/>
> >      </div>
> >    </xsl:template>
> >
> >    <xsl:template match="heading">
> >      <h1>
> >        <xsl:apply-templates/>
> >      </h1>
> >    </xsl:template>
>
> This has brought me closer to what I would like to
> accomplish, however I now have the following issue.
>
> I was using the xsl:value-of element with
> disable-output-escaping="yes" to produce HTML formatted
> text in the browser screen.  You see within the <content>
> node there may be HTML that should be displayed as such.
> Your method produces all of the text in the correct order
> and formatted according to tag name, but produces HTML
> tags which should be hidden.
>
> ie.
>
> <content>
>   There may be some <i>italicized</i> text...
>   <heading>Maybe even <u>formatting in a
>   heading</u></heading>
>   ...
> </content>
>
> Is there some way to overcome this?
>
> I should say that the HTML tags within my XML document
> are stored as entities (at least the < character is) i.e.
>
> <content>
>   This is some &lt;i>italicized&lt;/i> text...
>   ...
> </content>

Don't do that, it seems to lead to innumerable problems.
Store you mark-up as XML instead:

  <content>
    This is some <i>italicized</i> text...
    ...
  </content>

....and use the identity transformation to convert it into
HTML:

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

This also has the virtue of fitting neatly with the
solution for your original problem that Martin Honnen has
proposed.

You might also need to write exclusion templates for some
nodes, but that's hardly a problem.

-- 
roy axenov

0
r_axenov (54)
10/10/2006 6:54:38 PM
Not sure what a top-post is...

While I see what your saying Roy, the problem is that the contained
HTML is not necessarily well formed because of the way that its formed
at this time.  Perhaps when I have figured out how to force it to be
well formed I can use this solution.

Thanks.

roy axenov wrote:
> Please don't top-post.
>
> gregmcmullinjr@gmail.com wrote:
> > Martin Honnen wrote:
> > > gregmcmullinjr@gmail.com wrote:
> > > >     <content>
> > > >       This would be some content... every once in a
> > > >       while you may run into
> > > >       <heading>A Heading!</heading>
> > > >       Which could be followed by more content... and
> > > >       possible
> > > >       <heading>More Headings.</heading>
> > > >       and even more content!
> > > >     </content>
> > >
> > > Use templates and xsl:apply-templates e.g.
> > >
> > >    <xsl:template match="content">
> > >      <div>
> > >        <xsl:apply-templates/>
> > >      </div>
> > >    </xsl:template>
> > >
> > >    <xsl:template match="heading">
> > >      <h1>
> > >        <xsl:apply-templates/>
> > >      </h1>
> > >    </xsl:template>
> >
> > This has brought me closer to what I would like to
> > accomplish, however I now have the following issue.
> >
> > I was using the xsl:value-of element with
> > disable-output-escaping="yes" to produce HTML formatted
> > text in the browser screen.  You see within the <content>
> > node there may be HTML that should be displayed as such.
> > Your method produces all of the text in the correct order
> > and formatted according to tag name, but produces HTML
> > tags which should be hidden.
> >
> > ie.
> >
> > <content>
> >   There may be some <i>italicized</i> text...
> >   <heading>Maybe even <u>formatting in a
> >   heading</u></heading>
> >   ...
> > </content>
> >
> > Is there some way to overcome this?
> >
> > I should say that the HTML tags within my XML document
> > are stored as entities (at least the < character is) i.e.
> >
> > <content>
> >   This is some &lt;i>italicized&lt;/i> text...
> >   ...
> > </content>
>
> Don't do that, it seems to lead to innumerable problems.
> Store you mark-up as XML instead:
>
>   <content>
>     This is some <i>italicized</i> text...
>     ...
>   </content>
>
> ...and use the identity transformation to convert it into
> HTML:
>
>   <xsl:template match="@*|node()">
>     <xsl:copy>
>       <xsl:apply-templates select="@*|node()"/>
>     </xsl:copy>
>   </xsl:template>
>
> This also has the virtue of fitting neatly with the
> solution for your original problem that Martin Honnen has
> proposed.
>
> You might also need to write exclusion templates for some
> nodes, but that's hardly a problem.
> 
> -- 
> roy axenov

0
10/10/2006 7:29:13 PM
gregmcmullinjr@gmail.com schrieb:
> roy axenov wrote:
>> Please don't top-post.

> Not sure what a top-post is...

Then ask a search engine. It will lead you to some documents like 
<http://www.catb.org/~esr/jargon/html/T/top-post.html>.

-- 
Johannes Koch
Spem in alium nunquam habui praeter in te, Deus Israel.
                          (Thomas Tallis, 40-part motet)
0
koch8601 (279)
10/10/2006 9:33:35 PM

gregmcmullinjr@gmail.com wrote:


> It can be overridden simply by creating a new custom template, which I
> did as the following:
> 
> <xsl:template match="text()|@*">
>   <xsl:value-of select="." disable-output-escaping="yes"/>
> </xsl:template>
> 
> The result is that the HTML in the text nodes outputs as desired.

If that works for you then you can use it. But you should be aware that 
disable-output-escaping support is an optional feature during 
serialization of the result tree meaning it might not be supported at 
all by an XSLT processor or it is not supported when you don't serialize 
the result tree (e.g. when you chain transformation or e.g. in a browser 
like Mozilla where the result tree is being rendered directly without 
any serialization happening).

-- 

	Martin Honnen
	http://JavaScript.FAQTs.com/
0
mahotrash (2052)
10/11/2006 1:34:25 PM
I think this will suffice for my needs as I am doing the
transformations on the server.

Thanks again.

Martin Honnen wrote:
> gregmcmullinjr@gmail.com wrote:
>
>
> > It can be overridden simply by creating a new custom template, which I
> > did as the following:
> >
> > <xsl:template match="text()|@*">
> >   <xsl:value-of select="." disable-output-escaping="yes"/>
> > </xsl:template>
> >
> > The result is that the HTML in the text nodes outputs as desired.
>
> If that works for you then you can use it. But you should be aware that
> disable-output-escaping support is an optional feature during
> serialization of the result tree meaning it might not be supported at
> all by an XSLT processor or it is not supported when you don't serialize
> the result tree (e.g. when you chain transformation or e.g. in a browser
> like Mozilla where the result tree is being rendered directly without
> any serialization happening).
> 
> -- 
> 
> 	Martin Honnen
> 	http://JavaScript.FAQTs.com/

0
10/11/2006 2:48:02 PM
Reply: