f



Getting kind of abstract text snippets from text nodes

Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

.... I am the <b>substring</b> from a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

I found an intersting post "cut down text" which is almost that what
I
am looking for, but there the
text is just trimmed by x characters.

Is anybody here, that has an "elegant" way to solve that or some
hints
that get me to the solution? I am not able to use regex (would be
nice
though)
My parser is Sablotron so I am restricted to the functions that I
get.
(1.0).


Any help is greatly appreciated.


regards,
Andreas W Wylach

0
aw (13)
3/8/2007 11:59:37 AM
comp.text.xml 8781 articles. 0 followers. Post Follow

2 Replies
534 Views

Similar Articles

[PageSpeed] 14

Think about dividing the text into three parts: before your target, the 
target itself, and after the target. Process each appropriately. If you 
want to report multiple instances within the same block of text, look at 
the standard examples of recursive text processing.


-- 
() ASCII Ribbon Campaign  | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
0
3/8/2007 1:18:05 PM
"Andreas W. Wylach" <aw@ioc3.de> wrote in message 
news:1173355177.413765.175630@8g2000cwh.googlegroups.com...
> Hi everybody,
>
> I am about implementing a little search engine that searches a phrase
> over xml text nodes. I got
> that all working fine but what I want as the results is not the
> complete text of the textnode,
> I would like to make an abstract like result list (such output that
> you get with google searches.
>
> For eg
>
> ... I am the <b>substring</b> from a complete text node ...
>
> where "substring" is the search term.
>
> The problem is simple (I think): I want to extract all the text parts
> of the complete text node,
> where search searchterm is highlighted, surrounded by the text like
> 30
> characters.


FXSL gives you exactly that (look for testConcordance.xsl).

As first shown here a year and a half ago:


     http://www.stylusstudio.com/xsllist/200511/post00560.html

this was used to create a concordance of the text of the New Testament for 
any word longer than three characters with frequency count in the document 
not exceeding a given frequency count parameter (1280, which practically 
leaves out mainly pronouns).

The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6 
(at that time) needed less than 92 seconds to produce the complete (huge) 
concordance. The source xml document: "ot Ending Spaces.xml" is almost 50 
000 (fifty thousand) lines  long.

This is just one illustration of the reality of what can be done with XSLT, 
disspelling the myths of "XSLT cannot do this or that 
efficiently/elegantly".

Hope this helped.


Cheers,
Dimitre Novatchev

 


0
dimitren1 (155)
3/10/2007 2:59:59 PM
Reply:

Similar Artilces:

text-text
Wondering how what I input to my UTF-8 terminal gets passed along through my patched [1] trn ... Cyrillic: А Б В Г Д Е Ж З И Й К Л М Н О П а б в г д е ж з и й к л м н о п IPA: ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ [1] https://groups.google.com/d/msg/comp.sys.raspberry-pi/7Z37Hdrm0DM/6aqD-reXFzAJ ...

text + text
What is "text + text" supposed to do right now? It doesn't seem very useful to me. What about making "text + text" as an equivalent for "text || text"? Most strongly-typed programming languages do this. And MS SQL Server too, I think (CMIIW). -- dave ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Am Freitag, 8. Oktober 2004 12:57 schrieb David Garamond: > What is "text + text" supposed to do right now? Nothing. > What about making "text + text" as an equivalent for "text > || text"? Most strongly-typed programming languages do this. And MS SQL > Server too, I think (CMIIW). What would this gain except for bloat? It's not like SQL is utterly compatible with any programming language; users will still have to learn all the operators anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Peter Eisentraut wrote: >>What is "text + text" supposed to do right now? > > Nothing. Then are these bugs? (7.4.5 and 8.0.0beta1 give same results). Frankly, the current behaviour is quite strange to me. ------------------ =...

Surrounding text with text
I was wondering if it was possible to surround a text body with text like so: +--------------+ |ABCDEFGHIJKLM | |H N| |A Main Body O| |L Text goes P| |B here Q| |-=+_ZYXWVUTSR | | | | | | | | | +--------------+ This seems far-fetched, but just curious. I suppose that I could just move stuff around by hand in the GIMP later on, but there's probably a {better,more {extensible,clean}} way of doing it from (La)TeX. -FreeSmith ptjm@interlog.com (Patrick TJ McPhee) wrote in message news:<bffbhe$per$1@news.eusc.inter.net>... > Peter Flynn had an example like that, which he either posted here or > posted a link here. It was something to do with a certificate -- try > searching for his name and that word in google. > > It wouldn't be hard to do with metapost. I have a macro called `pathalong' > which puts text along an arbitrary path. Well, it's been a while, but I finally finished it. It's a little job I was doing for an upcoming wedding. At any rate, I used pstricks to make the text go along a path, along with a couple of hacks for sizing (from graphicx) and kerning along the line (from soul). Here's the code (don't laugh): CAUTION:THIS CODE MAY MAKE YOU WRETCH ---SNIP--- \documentclass{article} \usepackage{graphicx} \usepackage{pstricks,pst-text,nopageno} \usepackage[width=8.5in]{geometry} \usepackage{soul} \begin{document} \thispagestyle{empty} \set...

Pages
I have a titel-textfield over a pic (headline) , text is black, background for text transparent. A second textfield should overlapp the first textline..... When i arrange the second field with the same settings like the first, the first text disappear... How can i do, that the second text overpapps the first, all over the pic. Any help appreciated! Thanks for replies! I am german and hope that the engish speaking people understand my problem! Soory! Gerd In article <611db9e2-b085-4fe5-907a-ca714b0c32dd@m74g2000hsh.googlegroups.com>, hurlebaus <gerd.schenk@freenet.de> wrote:...

Text from required text box to read-only text box
Hello, I am fairly new to JavaScript and its use in Acrobat Professional. My situation is this: I have a form with a text box field which is required for the user to enter his/her name. I would like the required text box to display the name in all caps. I also need the user's name to appear in a read-only text box later in the form, which I would like to have the first letter of the user's first, middle initial, and last names to be capitalized. I would also like to have all required fields on the form highlighted in yellow, but the highlighting not printed. Lastly, I would like the...

text 2 text
I'm rather new to ustation and am having trouble finding replacements for all of my Acad lisp routines. One that I'd like to find changes a selected text string to read like the second selected text string. I realize this can all be done in the text editor but it's not all that quick. Does anyone know of a macro or mdl which can handle this task? Thanks in advance.---Calvin I don't know of any application that will do what you describe, but have you looked at Edit > Find/Replace text? You may also find Bentley's discussion groups of assistance. Over there you can meet other users of Bentley products, exchange ideas, and discuss a wide range of technical subjects. These groups are an excellent technical resource for all users of Bentley products and services. Hope to see you there! For more information take a peek at this page: http://selectservices.bentley.com/discussion/index.htm -- Best Regards, Inga Morozoff [Bentley] www.askinga.com "jgonzales24" <jgonzales24@cox.net> wrote in message news:xcLMb.13975$zs4.2376@fed1read01... > I'm rather new to ustation and am having trouble finding replacements for > all of my Acad lisp routines. One that I'd like to find changes a selected > text string to read like the second selected text string. I realize this > can all be done in the text editor but it's not all that quick. Does anyone > know of a macro or mdl whi...

How to get selected text from a text edit control?
I created a text edit control using CreateEditTextControl. I can get the current text by calling char buf[256]; Size textSize = 0; GetControlData(controlRef, kControlEditTextPart, kControlEditTextTextTag, 255, buf, &textSize); but how do I get only the text that is currently selected (highlighted)? thanks, Shai In article <1111086224.345185.78280@g14g2000cwa.googlegroups.com>, shai@waves.com wrote: > I created a text edit control using CreateEditTextControl. > I can get the current text by calling > > char buf[256]; > Size textSize = 0; ...

pdf \ text (get rid of text in pdf)
Is there a way to remove all text from PDF? Will extract images work for you? If so, PDF-Tools by Tracker Software will do it. http://www.docu-track.com/ -- Don Vancouver, USA "MarosV" <maros.vranec@gmail.com> wrote in message news:ebb897e1-c8e3-4b3a-9274-dfd9d2c845c3@c4g2000hsg.googlegroups.com... > Is there a way to remove all text from PDF? ...

XSLT Select nodes without text-node children whose names starts with specifix text
Question on XSL expression Got this XML: <Body> <Page> <Line no="9" detail="true"> <onefield>onefieldstext</onefield> <twofield>twofieldstext</twofield> </Line> <Line no="10" detail="true"> <onefield>onefieldstext</onefield> <fgman9>fgmanfieldstext</fgman9> <twofield>twofieldstext</twofield> </Line> <Line no="11" detail="true"> <onefield>onefieldstext</onefield> <twofield>twofieldstext</twofield> </Line> <Line no="12" detail="true"> <onefield>onefieldstext</onefield> <twofield>twofieldstext</twofield> </Line> <Line no="13" detail="true"> <onefield>onefieldstext</onefield> <fgman5>fgmanfieldstext</fgman5> <twofield>twofieldstext</twofield> </Line> <Line no="14" detail="true"> <onefield>onefieldstext</onefield> <twofield>twofieldstext</twofield> </Line> </Page> </Body> I would select the <Line/> nodes without text-node children whose names is starting with "fgman" - in this example it is all <Line/> _except_ <fgman9/> and <fgman5/> in <Line/> with @no of 10 and 13. I know that this works: &...

Reading Text File, Text Scrolling and erase line from a text file
Hi , How can I Reading Text File, then Text Scrolling by means of keys "page UP" "page Down" and erase a specific line from a text file , when press "F2" key? Best Regards, Lidia from Poland -- ========================================= Pozdrawiam Lidiaa Lidiaa schrieb: > Hi , > > How can I Reading Text File, then Text Scrolling by means of > keys "page UP" "page Down" and erase a specific line from a text > file , when press "F2" key? > > ...

text node has text but won't render
This renders in Firefox perfectly well but the text in the red box remains invisible. The program is a subset of a larger and doesn't do much. but even after cutting out all the unneccessary stuf, I still can't get it to work!! Cheers, Greg =================== <xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" > <xhtml:head> <xhtml:title> Intermingled XHTML and SVG </xhtml:title> <xhtml:script type="text/javascript" language="JavaScript"><![CDATA[ function gogo(evt){ var targetObj = evt.target; //The object that received the event var targetDoc = targetObj.ownerDocument; //Owner document var wg = document.getElementById("SVGroot"); var lg = document.getElementById("labelz"); //label labelBox= targetDoc.createElementNS("http://www.w3.org/2000/svg", "svg:rect"); labelBox.setAttributeNS(null, "id", "label1"); labelBox.setAttributeNS(null, "fill", "red"); labelBox.setAttributeNS(null, "fill-opacity", 1); labelBox.setAttributeNS(null, "x", 700); labelBox.setAttributeNS(null, "y", 400); labelBox.setAttributeNS(null, "width", 200); labelBox.setAttributeNS(null, "height", 20); labelBox.setAttributeNS(null, "visibility", "visible"); lg.appendC...

file command: "XML document text" vs "XML document text"
I've just used the file command on four files of RDF-XML with the following output: augtfidf.rdf: XML document text kyoto.rdf: XML document text stuff.rdf: XML document text tfidf.rdf: XML document text What does it mean that one of them has an extra space between "XML" and "document"? -- The kid's a hot prospect. He's got a good head for merchandising, an agent who can take you downtown and one of the best urine samples I've seen in a long time. [Dead Kennedys t-shirt] On 16.10.2012 16:06, Adam Funk wrote: > I've just used the file command on four files of RDF-XML with the > following output: > > augtfidf.rdf: XML document text > kyoto.rdf: XML document text > stuff.rdf: XML document text > tfidf.rdf: XML document text > > What does it mean that one of them has an extra space between "XML" > and "document"? Hard to tell without further information. What does file *.rdf | od -c show you? Janis Janis Papanagnou wrote: > On 16.10.2012 16:06, Adam Funk wrote: >> I've just used the file command on four files of RDF-XML with the >> following output: >> >> augtfidf.rdf: XML document text >> kyoto.rdf: XML document text >> stuff.rdf: XML document text >> tfidf.rdf: XML document text >> >> What does it mean that one ...

Text recognition and Text to speech
I'm working on a project for Blind assistance using OCR and TTS . if you can help with the CODES needed for both phases of text recognition and text to speech conversion ...

How can you get a form to display text from an external text document?
How can you get all the text from a text document (like notepad) and put it into a label? Also is it possible to change the text in the text document? Thank you. Hi James, You need to read the text file line by line, concatenate the lines together and then assign it to the caption of the label. Check out the Line Input # statement in Help. HTH Martin "James" <jas4@thanet.ac.uk> wrote in message news:1102928059.579651.251840@c13g2000cwb.googlegroups.com... > How can you get all the text from a text document (like notepad) and > put it into a label? Also is it possib...

Text IN an Image To Straight Text?
Hi. (I'm not sure where to ask this so I'm trying a few newsgroups) I have some .jpg's of text documents. Is there a utility which will work like OCR in a scanner, helping to convert the text WITHIN the image to straight text? I don't have a printer working at the moment, or I'd print the .jpg's and then scan them using OCR. (The format doesn't necessarily have to be .jpg) Thanks in advance for any help GP Yes, if the image is not to bad a pic (not much noise and all that). Try http://www.twocows.com/windows.html I know they have a couple of ocr programs s...

Text in images to a text file
Hello, Does anyone know of any software for Linux which converts text in images to a text file. I have loads of paper documents that I intend to scan. The contents of these documents needs to eventually end up on a web site but as text. Darren <bobblebob@gmail.com> wrote in message news:1151612491.407262.124690@75g2000cwc.googlegroups.com > Does anyone know of any software for Linux which converts text in > images to a text file. I have loads of paper documents that I intend > to scan. The contents of these documents needs to eventually end up > on a web site but as text....

rich text in text field
I am using GUIDE for my UI work. (MATLAB 2009b) I would like to display some rich-text (font size change, bold, underline etc) inside a panel. Can I put some rich-text, using html tags, in a text field? For example: <html><b> this is bold </b></html> I know you can do it in table headers. Thank you. ...

Text filtering in a text field
Hi, am new to SKILL and would like to do some text checking and filtering for illegal characters in the text field. Meaning I have a text field in my form that is only to be with text, not any of the symbols (i.e. ~!@#$%^&* and so on). When the user input and saved the fields, those characters won't be allowed to be saved. What will be the best way to filter out only text? Only numbers? Terence <terrylau77@gmail.com> writes: > Hi, am new to SKILL and would like to do some text checking and > filtering for illegal characters in the text field. Meaning I have a > text f...

Placing text in PICT, problem getting text all the way to the bottom
49G+ I am creating a PICT to view in my program. One of the things I am doing it placing labels on the drawing. I use a short line of code below to place a simple string in small text in the drawing: { # 40h # 49h } "Label" 1. \->GROB PICT ROT ROT GOR I want the text to be at the very bottom, every pixel counts kinda thing, and when I use { # 40h # 50h } I get an error and the text is two pixels up from the bottom. Any advice to get it all the way down to the bottom or this not possible on the 49G+. >I am creating a PICT to view in my program. One of the things I am &g...

PHP - using mail() and unicode text
I have the following problem. On a website there's a (simple) feedback form. This is used also by Polish visitors who (of course) type Polish text using special characters. However, when I receive the text in my mailbox, all special characters have been turned into mess...... For example: "wsp�lprace" is turned into "współprace". It seems PHP is handling the Unicode-8 strings quite well (when I 'echo' the strings on the site, I see the text correctly), until the point that it is send by using mail(). Is this a server configuration issue? Or something el...

popup text when the pointer is on some text
Hi, on most (all?) browsers, when you put the pointer on a <a href="..." title="popup text">this is a link</a> without clicking on the link, there is a popup caption with the "popup text". I would like to achieve the same with some text that is not a link. I realized that I could simply make it <a title="popup text">this is just some text</a> My question: is it the best way to achieve what I want? Are there alternative ways? Is it portable? Thank you! Once upon a time *Orloff* wrote: > Hi, > on most (al...

How to highlight all the Text for a Text field when getting focus by a mouse click?
I followed the direction in the HELP and tried several other things. But can not get it to work properly. 1. From the Window Formatter, RIGHT-CLICK the control and choose Alert from the popup menu. 2. From the Alert Keys dialog, press the Add button. 3. From the Input Key dialog, CLICK on Left Button in the Mouse group. 4. Press OK twice to return to the Window Formatter. 5. From the Window Formatter, RIGHT-CLICK the control and choose Embeds from the popup menu. 6. Add the following code to the EVENT:AlertKey embed point for the control: IF KEYCODE() = MouseLeft SELECT(?,LEN(?{PROP:Screen...

Merging two text files based on some kind of text anchors
Dear all, I would like to merge two text files based on some criteria driven by regular expressions. Basically, this is what I'd like to achieve: Contents of file 1: abc def ghi Contents of file 2 (defines the anchors and their contents): [anchor1]This is some text for anchor 1. [anchor2]This is some text for anchor 2. The file which is the result of the merger process should look like this: abc This is some text for anchor 1. def ghi This is some text for anchor 2. I'd like to define the anchors as regular expressions (identifying a single line in file 1 or a range of multip...

Highlight text in text box
I want to highlight any existing text, in a text box, when I click on it, like when I Tab into the text box. I can't find anything in the help. Any help is appreciated ShyGuy You can use selstart and sellength properties of a textbox in order to select any existing text. In order to select the whole text of textbox01, use, me.textbox01.selstart=0 me.textbox01.sellength=len(nz(me.textbox01)) Regards, Ramesh ShyGuy <shyguy@shytown.com> wrote in message news:<44nhi0d9qnsvdru1lu5l0d3ncuvuqsi39p@4ax.com>... > I want to highlight any existing text, in a text box, when I click on > it, like when I Tab into the text box. I can't find anything in the > help. > > Any help is appreciated > > ShyGuy I tried to send this twice and couldn't figure out why it wasn't showing up. The I checked and realized that it was going to your email address. Don't know why.. anyway... Thank you very much. That works great with my text box. I tried it with a combo box but it didn't work. I tried adding SelText but it still didn't work. Any suggestions would be greatly appreciated. ShyGuy On 23 Aug 2004 01:39:41 -0700, bhandaritwo@yahoo.com (Ramesh Kumar Bhandari) wrote: >You can use selstart and sellength properties of a textbox in order to >select any existing text. In order to select the whole text of >textbox01, use, >me.textbox01.selstart=0 >me.textbox01.sellength=l...

Web resources about - Getting kind of abstract text snippets from text nodes - comp.text.xml

Sealed Abstract
sealed abstract class drew;

List (abstract data type) - Wikipedia, the free encyclopedia
In computer science , a list or sequence is an abstract data type that implements a finite ordered collection of values , where the same value ...

Lost in artistic translation: Exhibition reviews abstract arts in China(2)
Lost in artistic translation: Exhibition reviews abstract arts in China(2)

Give photos and videos some abstract flair with GeometriCam
GeometriCam - abstract geometric design in real-time Give photos and videos some abstract flair with GeometriCam is a story by AppAdvice.

Ellsworth Kelly, Giant of Abstract Painting, Dies at 92
ARTnews Ellsworth Kelly, Giant of Abstract Painting, Dies at 92 ARTnews Ellsworth Kelly, who began his influential 70-year career with thrilling ...

D.C.'s corporate law firms, in modern new office spaces, jump on the abstract art train - Washington ...
As office spaces change and associates skew younger, law firms look to liven up their art collections.

The abstract beauty of Greenland
60 Minutes reports from the top of the world and captures images of a rarely seen landscape that can be described as both serene and severe

Abstract Painter Carl Morris
Carl Morris, an abstract painter who studied in Chicago and whose work echoed the forces and forms of nature, has died. He was 82.Mr. Morris ...

Cartoon: Abstract Art, Meet Abstract Banking
What's the difference when both become incredibly complicated? In Andy Singer's cartoon, the very smart and the very gullible buy in; the rest ...

These Kaleidoscopic, Abstract Images Are Trippy And Gorgeous
Sometimes you see the work of an artist and you just want to stare at it for a long, long time. Such is the art of Joe Eisner, whose digital ...

Resources last updated: 3/13/2016 10:13:44 PM