f



Getting kind of abstract text snippets from text nodes

Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

.... I am the <b>substring</b> from a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

I found an intersting post "cut down text" which is almost that what
I
am looking for, but there the
text is just trimmed by x characters.

Is anybody here, that has an "elegant" way to solve that or some
hints
that get me to the solution? I am not able to use regex (would be
nice
though)
My parser is Sablotron so I am restricted to the functions that I
get.
(1.0).


Any help is greatly appreciated.


regards,
Andreas W Wylach

0
aw (13)
3/8/2007 11:59:37 AM
comp.text.xml 8781 articles. 0 followers. Post Follow

2 Replies
789 Views

Similar Articles

[PageSpeed] 38

Think about dividing the text into three parts: before your target, the 
target itself, and after the target. Process each appropriately. If you 
want to report multiple instances within the same block of text, look at 
the standard examples of recursive text processing.


-- 
() ASCII Ribbon Campaign  | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
0
3/8/2007 1:18:05 PM
"Andreas W. Wylach" <aw@ioc3.de> wrote in message 
news:1173355177.413765.175630@8g2000cwh.googlegroups.com...
> Hi everybody,
>
> I am about implementing a little search engine that searches a phrase
> over xml text nodes. I got
> that all working fine but what I want as the results is not the
> complete text of the textnode,
> I would like to make an abstract like result list (such output that
> you get with google searches.
>
> For eg
>
> ... I am the <b>substring</b> from a complete text node ...
>
> where "substring" is the search term.
>
> The problem is simple (I think): I want to extract all the text parts
> of the complete text node,
> where search searchterm is highlighted, surrounded by the text like
> 30
> characters.


FXSL gives you exactly that (look for testConcordance.xsl).

As first shown here a year and a half ago:


     http://www.stylusstudio.com/xsllist/200511/post00560.html

this was used to create a concordance of the text of the New Testament for 
any word longer than three characters with frequency count in the document 
not exceeding a given frequency count parameter (1280, which practically 
leaves out mainly pronouns).

The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6 
(at that time) needed less than 92 seconds to produce the complete (huge) 
concordance. The source xml document: "ot Ending Spaces.xml" is almost 50 
000 (fifty thousand) lines  long.

This is just one illustration of the reality of what can be done with XSLT, 
disspelling the myths of "XSLT cannot do this or that 
efficiently/elegantly".

Hope this helped.


Cheers,
Dimitre Novatchev

 


0
dimitren1 (155)
3/10/2007 2:59:59 PM
Reply: