class Parser {...};

  • Follow


Hi,

I am looking for a class that can parse strings, then allow me to pluck
out nodes from the parse tree (or list).

Consider a US ZIP code as simple example:

78727-4425

US ZIP codes can be either a sequence of 5 digits or a sequence of 5
digits followed by 4 digits.  The corresponding production is:

ZIP_Code -----> # # # # # | (# # # # # '-' # # # #)

Naturally, the production for # is:

# -----> '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

I would like to take a string S, specify a pattern E to be applied to
S, the pick out the parsed elements of the string according to E.  To
help with specifying E, the class I am looking for would contain as a
member map<string, string> for the grammar, which I will populate
before invoking the parse function:

p.grammar.insert ("ZIP_code", "# # # # # | (# # # # # '-' # # # #)");
p.grammar.insert ("#", '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' |
'9'");

There might be a ::prepare() member functinon that generates the
appropriate automata based on the grammar:

p.prepare();

Then get ready to parse some strings:

string S1 = "77 Massachusetts Avenue, Cambridge, MA 02139, USA"
string S2 = "129 Franklin St., Apt. 100, Cambridge, MA 02139-3067"

Specify a pattern:

string E = "* ZIP_code *"

Parser().parse(S1, E) and Parser().parse(S2, E) should both yield
clumps of strings (tree or list or whatever) where I can find the node
ZIP_code and extract it.

p.parse (S1, "* ZIP_code", ????);

The ??? would be a data structure, most likely an associative polyarchy
maping string to string, containing the parse tree, but the details of
this is not important.

Does such a class exist?

It would be orders of magnitude better than what I am doing now, which
is writing a function to do it all by hand, each time, every time.

-Chaud Lapin-


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Reply unoriginal_username (63) 3/25/2005 11:00:51 AM

Le Chaud Lapin a �crit :
> Hi,
>
> I am looking for a class that can parse strings, then allow me to pluck
> out nodes from the parse tree (or list).

Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
It will probably do everything you want and more.

HTH
--
Julien

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
0
Reply Julien 3/25/2005 12:18:18 PM


>>I am looking for a class that can parse strings, then allow me to pluck
>>out nodes from the parse tree (or list).
>
>
> Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
> It will probably do everything you want and more.
>

   or http://www.antlr.org/

TomS

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
0
Reply toms 3/26/2005 9:52:23 AM

"toms" <nandu@poczta.onet.pl> wrote in message
news:d21v2c$esh$1@news.onet.pl...
>
>>>I am looking for a class that can parse strings, then allow me to pluck
>>>out nodes from the parse tree (or list).
>>
>>
>> Look for Boost's Spirit parser. http://www.boost.org/libs/spirit/
>> It will probably do everything you want and more.
>>
>
>   or http://www.antlr.org/
>
> TomS


Or the YARD parser, http://www.ootl.org/yard

--
Christopher Diggins


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
0
Reply christopher 3/27/2005 2:20:30 AM

On 25 Mar 2005 06:00:51 -0500, Le Chaud Lapin <unoriginal_username@yahoo.com> wrote:
> Hi,
> 
> I am looking for a class that can parse strings, then allow me to pluck
> out nodes from the parse tree (or list).
> 
> Consider a US ZIP code as simple example:
> 
> 78727-4425
> 
> US ZIP codes can be either a sequence of 5 digits or a sequence of 5
> digits followed by 4 digits.  The corresponding production is:
....
> p.grammar.insert ("ZIP_code", "# # # # # | (# # # # # '-' # # # #)");
> p.grammar.insert ("#", '0' | '1' | '3' | '4' | '5' | '6' | '7' | '8' |
> '9'");
....

This idea seems useful, kind of like a run-time yacc. I almost wrote one the
other week, for parsing an especially tricky command-line language.

But if your problems aren't much more complex than the US ZIP code example,
first have a look at regular expressions (as in sed, awk and particularly
perl). In perl, the example would boil down to

  /^(\d{5})(-(\d{4}))?$/

with the first five digits ending up in the first group, and the optional
four in the third group ("group" being my name for what you call nodes).

There is a regex implementation in BOOST, there's the Gnu regex library, and
finally the PCRE library which I assume is more powerful. (Some of them may
be good at checking for match, but bad at breaking down the string.)

/Jorgen

-- 
  // Jorgen Grahn <jgrahn@       Ph'nglui mglw'nafh Cthulhu
\X/                algonet.se>   R'lyeh wgah'nagl fhtagn!

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
0
Reply Jorgen 3/28/2005 11:51:29 PM

4 Replies
101 Views

(page loaded in 0.096 seconds)

Similiar Articles:













7/18/2012 10:12:42 PM


Reply: