Hi,
I am looking for a class that can parse strings, then allow me to pluck
out nodes from the parse tree (or list).
Consider a US ZIP code as simple example:
78727-4425
US ZIP codes can be either a sequence of 5 digits or a sequence of 5
digits followed by 4 digits. The corresponding production is:
ZIP_Code -----> # # # # # | (# # # # # '-' # # # #)
Naturally, the production for # is:
# -----> '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
I would like to take a string S, specify a pattern E to be applied to
S, the pick out the parsed elements of the string according to E. To
help with specifying E, the class I am looking for would contain as a
member map<string, string> for the grammar, which I will populate
before invoking the parse function:
p.grammar.insert ("ZIP_code", "# # # # # | (# # # # # '-' # # # #)");
p.grammar.insert ("#", '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' |
'9'");
There might be a ::prepare() member functinon that generates the
appropriate automata based on the grammar:
p.prepare();
Then get ready to parse some strings:
string S1 = "77 Massachusetts Avenue, Cambridge, MA 02139, USA"
string S2 = "129 Franklin St., Apt. 100, Cambridge, MA 02139-3067"
Specify a pattern:
string E = "* ZIP_code *"
Parser().parse(S1, E) and Parser().parse(S2, E) should both yield
clumps of strings (tree or list or whatever) where I can find the node
ZIP_code and extract it.
p.parse (S1, "* ZIP_code", ????);
The ??? would be a data structure, most likely an associative polyarchy
maping string to string, containing the parse tree, but the details of
this is not important.
Does such a class exist?
It would be orders of magnitude better than what I am doing now, which
is writing a function to do it all by hand, each time, every time.
-Chaud Lapin-
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
unoriginal_username (63)
|
3/25/2005 11:00:51 AM |
|
Le Chaud Lapin a �crit :
> Hi,
>
> I am looking for a class that can parse strings, then allow me to pluck
> out nodes from the parse tree (or list).
Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
It will probably do everything you want and more.
HTH
--
Julien
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Julien
|
3/25/2005 12:18:18 PM
|
|
>>I am looking for a class that can parse strings, then allow me to pluck
>>out nodes from the parse tree (or list).
>
>
> Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
> It will probably do everything you want and more.
>
or http://www.antlr.org/
TomS
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
toms
|
3/26/2005 9:52:23 AM
|
|
"toms" <nandu@poczta.onet.pl> wrote in message
news:d21v2c$esh$1@news.onet.pl...
>
>>>I am looking for a class that can parse strings, then allow me to pluck
>>>out nodes from the parse tree (or list).
>>
>>
>> Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
>> It will probably do everything you want and more.
>>
>
> or http://www.antlr.org/
>
> TomS
Or the YARD parser, http://www.ootl.org/yard
--
Christopher Diggins
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
christopher
|
3/27/2005 2:20:30 AM
|
|
On 25 Mar 2005 06:00:51 -0500, Le Chaud Lapin <unoriginal_username@yahoo.com> wrote:
> Hi,
>
> I am looking for a class that can parse strings, then allow me to pluck
> out nodes from the parse tree (or list).
>
> Consider a US ZIP code as simple example:
>
> 78727-4425
>
> US ZIP codes can be either a sequence of 5 digits or a sequence of 5
> digits followed by 4 digits. The corresponding production is:
....
> p.grammar.insert ("ZIP_code", "# # # # # | (# # # # # '-' # # # #)");
> p.grammar.insert ("#", '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' |
> '9'");
....
This idea seems useful, kind of like a run-time yacc. I almost wrote one the
other week, for parsing an especially tricky command-line language.
But if your problems aren't much more complex than the US ZIP code example,
first have a look at regular expressions (as in sed, awk and particularly
perl). In perl, the example would boil down to
/^(\d{5})(-(\d{4}))?$/
with the first five digits ending up in the first group, and the optional
four in the third group ("group" being my name for what you call nodes).
There is a regex implementation in BOOST, there's the Gnu regex library, and
finally the PCRE library which I assume is more powerful. (Some of them may
be good at checking for match, but bad at breaking down the string.)
/Jorgen
--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Jorgen
|
3/28/2005 11:51:29 PM
|
|
|
4 Replies
101 Views
(page loaded in 0.096 seconds)
Similiar Articles: Parsing multiple lines with regex - comp.lang.java.programmer ...Hi,I'm making a program that seraches a html file for special lines, andI use theScanner class with a regex to find those lines. It is working well,bu... Parsing table in rtf file - comp.lang.perl.miscI tried RTF::Tokenizer and RTF::Parser but could not make progress so have ... ... given after the text portion, so you can try something like: #your sub class of RTF::Parser ... How to strip comments out of code - comp.lang.java.programmer ...Howdy...I need to write a class that will take a java file as input, strip allthe ... Important comments, i.e. /** ... */ must be preserver for parser because they may ... How to retrieve text content from PDF file by itext? - comp.text ...Secondly, could anyone give me some example in detailed codec to illurstrate me how to make a simplest PDF->text parser with PdfReader class in itext. Very fast delimited record parsing with boost - comp.lang.c++ ...Very fast delimited record parsing with boost - comp.lang.c++ ... Quite honestly, parsing delimited records is childs ... FWIW: my regular expression class builds a DFA ... Intel Visual Fortran Error Message - comp.lang.fortranI I can convince the compiler to treat class1 as written rather > than parsing it as class 1 like you're saying, I should be good to go. .... Look at the language ... Documenting object-oriented MATLAB code - comp.soft-sys.matlab ...Both have mostly agreed-upon standards for how class and method documentation should ... So it is possible to write a parser that looks for these declarations, and then ... design pattern for a file converter... - comp.lang.java.programmer ...Then you can pluggably invoke them by name with: final Class<? extends Macro ... MyParser parser = MyParser.factory( f ); Next, you might have different criteria ... create and parse mdl file programatically - comp.soft-sys.matlab ...DotNet PDF Maker Class .NET - 3.0.NET component for .NET to programmatically create ... Design idea: I like to implement a parser programme (say ... parse that generated log ... Using Mechanize To Submit Forms - comp.lang.rubyMechanize can return a Nokogiri::HTML document when you do: doc =3D page.parser Using Nokogiri you can do something like: doc.css('input[@class=3D"negativeButton ... Class parser.Parser - UNL | Arts & Sciences | Department of ...public class Parser extends Object implements Cloneable, Function A formula parser. A Parser object is constructed from a string which represents a mathematical ... Class Parserpublic class Parser extends Object implements Serializable, ConnectionMonitor. The main parser class. This is the primary class of the HTML Parser library. 7/18/2012 10:12:42 PM
|