I'm writing a C program which would parse a xml file as its input and
perform specific operations...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it
for example::: char country[][]={<countries>,
<country>,
<text>Norway</
text>,
<value>N</value>,
</country>}, and so on
My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..
Thanks
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/21/2011 8:35:01 PM |
|
On 03/22/11 09:35 AM, Maxx wrote:
> I'm writing a C program which would parse a xml file as its input and
> perform specific operations...
> Now what i have in my mind is that i should declare a two dimensional
> array and store the xml file in it
>
> for example::: char country[][]={<countries>,
> <country>,
> <text>Norway</
> text>,
> <value>N</value>,
> </country>}, and so on
>
>
> My question is... is there any better way to do this, i.e. is there
> any better way to store the xml input input..
That's more of a generic programming question than a C one. Have a look
at a common XML parser like libxml, the documentation will give you
ideas even if you choose not to use the library.
--
Ian Collins
|
|
0
|
|
|
|
Reply
|
Ian
|
3/21/2011 8:43:30 PM
|
|
Maxx <grungeddd.maxx@gmail.com> writes:
> I'm writing a C program which would parse a xml file as its input and
> perform specific operations...
What specific operations? See below...
> Now what i have in my mind is that i should declare a two dimensional
> array and store the xml file in it
>
> for example::: char country[][]={<countries>,
> <country>,
> <text>Norway</
> text>,
> <value>N</value>,
> </country>}, and so on
>
>
> My question is... is there any better way to do this, i.e. is there
> any better way to store the xml input input..
It's almost impossible to say without knowing how a piece of data is
going to be accessed (or manipulated).
A good place to post would be comp.programming. If you say what you
propose to do with the XML you should get good help there. Be prepared
to be told that you should use an existing XML parsing library (because
that is almost always the right answer).
--
Ben.
|
|
0
|
|
|
|
Reply
|
Ben
|
3/21/2011 9:53:32 PM
|
|
On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> I'm writing a C program which would parse a xml file as its input and
> perform specific operations...
> Now what i have in my mind is that i should declare a two dimensional
> array and store the xml file in it
> My question is... is there any better way to do this, i.e. is there any
> better way to store the xml input input..
Yes. In fact, it would be hard to imagine a worse way.
First, I wouldn't recommend trying to actually parse the XML yourself, as
you're practically bound to get it wrong. Use an XML parsing library
instead.
XML parsing libraries come in two main flavours: DOM and SAX. DOM
constructs a parse tree for the entire file, which the application can
then query. SAX generates events (reported via callbacks) as it parses the
file; it's up to the application to actually store the data.
Which flavour to use and exactly how to do it depend upon the details of
the application.
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/22/2011 3:41:31 AM
|
|
On Mar 21, 10:35=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
>
> My question is... is there any better way to do this, i.e. is there
> any better way to store the xml input input..
>
Think of the XML as a tree, and build what is known as a recursive
descent parser.
Basically it's the same problem as a mathematical expression with
deeply nested parentheses, in a slightly different form. You need one
token of lookahead.
Once you've converted the XML to a tree, you'll usually want to walk
the tree to convert to a set of nested arrays, but sometimes it will
be better to keep the data in tree form.
|
|
0
|
|
|
|
Reply
|
Malcolm
|
3/22/2011 6:40:36 AM
|
|
Malcolm McLean <malcolm.mclean5@btinternet.com> writes:
> On Mar 21, 10:35 pm, Maxx <grungeddd.m...@gmail.com> wrote:
>>
>> My question is... is there any better way to do this, i.e. is there
>> any better way to store the xml input input..
>>
> Think of the XML as a tree, and build what is known as a recursive
> descent parser.
>
> Basically it's the same problem as a mathematical expression with
> deeply nested parentheses, in a slightly different form. You need one
> token of lookahead.
>
> Once you've converted the XML to a tree, you'll usually want to walk
> the tree to convert to a set of nested arrays, but sometimes it will
> be better to keep the data in tree form.
I did it the other way round. First I wrote a good generic "values"
handling system that allowed me to have named strings, integers, lists,
string-indexed-arrays, all as recursive as you like. That was the
difficult bit.
They I just hooked xmlparse up to it and it sucked the XML in nicely.
Think hard about what you want to, if anything, to distinguish between:
<stuff>
<item>fred</item>
</stuff>
<stuff item="fred"/>
To summarise - you need more a specification of the problem before
starting to find a solution.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
|
|
0
|
|
|
|
Reply
|
Dr
|
3/22/2011 7:01:02 AM
|
|
On Mar 21, 1:43=A0pm, Ian Collins <ian-n...@hotmail.com> wrote:
> On 03/22/11 09:35 AM, Maxx wrote:
>
> > I'm writing a C program which would parse a xml file as its input and
> > perform specific operations...
> > Now what i have in my mind is that i should declare a two dimensional
> > array and store the xml file in it
>
> > for example::: =A0char country[][]=3D{<countries>,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 <country>,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <text>Norway</
> > text>,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0<value>N</value>,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0</country>}, and so on
>
> > My question is... is there any better way to do this, i.e. is there
> > any better way to store the xml input input..
>
> That's more of a generic programming question than a C one. =A0Have a loo=
k
> at a common XML parser like libxml, the documentation will give you
> ideas even if you choose not to use the library.
>
> --
> Ian Collins
Alright i've looked up libxml and seems to have hit jackpot... It does
contains the necessary function which i need...
Thanks
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/22/2011 8:10:06 PM
|
|
On Mar 21, 8:41=A0pm, Nobody <nob...@nowhere.com> wrote:
> On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > I'm writing a C program which would parse a xml file as its input and
> > perform specific operations...
> > Now what i have in my mind is that i should declare a two dimensional
> > array and store the xml file in it
> > My question is... is there any better way to do this, i.e. is there any
> > better way to store the xml input input..
>
> Yes. In fact, it would be hard to imagine a worse way.
>
> First, I wouldn't recommend trying to actually parse the XML yourself, as
> you're practically bound to get it wrong. Use an XML parsing library
> instead.
>
> XML parsing libraries come in two main flavours: DOM and SAX. DOM
> constructs a parse tree for the entire file, which the application can
> then query. SAX generates events (reported via callbacks) as it parses th=
e
> file; it's up to the application to actually store the data.
>
> Which flavour to use and exactly how to do it depend upon the details of
> the application.
Actually the xml file that i was going to provide the program will
always have a predefined format, like the one example i gave above.It
will always parse the same format and simply extract the values from
the fields and write another xml file having the same template... so i
was looking for the easiest way to solve it, instead of requiring to
call extensive library functions...
any ways Thanks
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/22/2011 8:13:55 PM
|
|
On Mar 21, 11:40=A0pm, Malcolm McLean <malcolm.mcle...@btinternet.com>
wrote:
> On Mar 21, 10:35=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
>
> > My question is... is there any better way to do this, i.e. is there
> > any better way to store the xml input input..
>
> Think of the XML as a tree, and build what is known as a recursive
> descent parser.
>
> Basically it's the same problem as a mathematical expression with
> deeply nested parentheses, in a slightly different form. You need one
> token of lookahead.
>
> Once you've converted the XML to a tree, you'll usually want to walk
> the tree to convert to a set of nested arrays, but sometimes it will
> be better to keep the data in tree form.
Yeah i had this concept in mind at first, but as i was going to write
a simple program which would simply extract values from a set of
predefined fields, so i kinda avoided going into trees.. Although i
recon a tree would be the best solution but i'm still quite naive in
trees.
Thanks
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/22/2011 8:17:15 PM
|
|
On Mar 22, 12:01=A0am, Dr Nick <3-nos...@temporary-address.org.uk>
wrote:
> Malcolm McLean <malcolm.mcle...@btinternet.com> writes:
> > On Mar 21, 10:35=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
>
> >> My question is... is there any better way to do this, i.e. is there
> >> any better way to store the xml input input..
>
> > Think of the XML as a tree, and build what is known as a recursive
> > descent parser.
>
> > Basically it's the same problem as a mathematical expression with
> > deeply nested parentheses, in a slightly different form. You need one
> > token of lookahead.
>
> > Once you've converted the XML to a tree, you'll usually want to walk
> > the tree to convert to a set of nested arrays, but sometimes it will
> > be better to keep the data in tree form.
>
> I did it the other way round. =A0First I wrote a good generic "values"
> handling system that allowed me to have named strings, integers, lists,
> string-indexed-arrays, all as recursive as you like. =A0 That was the
> difficult bit.
>
> They I just hooked xmlparse up to it and it sucked the XML in nicely.
>
> Think hard about what you want to, if anything, to distinguish between:
>
> <stuff>
> <item>fred</item>
> </stuff>
>
> <stuff item=3D"fred"/>
>
> To summarise - you need more a specification of the problem before
> starting to find a solution.
> --
> Online waterways route planner =A0 =A0 =A0 =A0 =A0 =A0|http://canalplan.e=
u
> Plan trips, see photos, check facilities =A0|http://canalplan.org.uk
Yeah yeah a generic list of values would be helpful but i need more
ideas on how to implement it.. I'm trying to avoid library function in
this program as it will always parse the same fields over and over
again..
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/22/2011 8:20:26 PM
|
|
On Mar 22, 4:13=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
> On Mar 21, 8:41=A0pm, Nobody <nob...@nowhere.com> wrote:
>
>
>
> > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > I'm writing a C program which would parse a xml file as its input and
> > > perform specific operations...
> > > Now what i have in my mind is that i should declare a two dimensional
> > > array and store the xml file in it
> > > My question is... is there any better way to do this, i.e. is there a=
ny
> > > better way to store the xml input input..
>
> > Yes. In fact, it would be hard to imagine a worse way.
>
> > First, I wouldn't recommend trying to actually parse the XML yourself, =
as
> > you're practically bound to get it wrong. Use an XML parsing library
> > instead.
>
> > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > constructs a parse tree for the entire file, which the application can
> > then query. SAX generates events (reported via callbacks) as it parses =
the
> > file; it's up to the application to actually store the data.
>
> > Which flavour to use and exactly how to do it depend upon the details o=
f
> > the application.
>
> Actually the xml file that i was going to provide the program will
> always have a predefined format, like the one example i gave above.It
> will always parse the same format and simply extract the values from
> the fields and write another xml file having the same template... so i
> was looking for the easiest way to solve it, instead of requiring to
> call extensive library functions...
Note that it always starts this way. It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser? But then
there are modifications/extensions/etc. People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse. People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing. People suddenly want validation. etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...
|
|
0
|
|
|
|
Reply
|
David
|
3/23/2011 6:45:47 PM
|
|
On Mar 23, 1:45=A0pm, David Resnick <lndresn...@gmail.com> wrote:
> On Mar 22, 4:13=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
>
>
>
>
>
> > On Mar 21, 8:41=A0pm, Nobody <nob...@nowhere.com> wrote:
>
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input a=
nd
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimension=
al
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is there=
any
> > > > better way to store the xml input input..
>
> > > Yes. In fact, it would be hard to imagine a worse way.
>
> > > First, I wouldn't recommend trying to actually parse the XML yourself=
, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.
>
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application ca=
n
> > > then query. SAX generates events (reported via callbacks) as it parse=
s the
> > > file; it's up to the application to actually store the data.
>
> > > Which flavour to use and exactly how to do it depend upon the details=
of
> > > the application.
>
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...
>
> Note that it always starts this way. =A0It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? =A0But then
> there are modifications/extensions/etc. =A0People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. =A0People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. =A0People suddenly want validation. =A0etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...
Not to mention it's code that *you* don't have to write or test.
Figuring out how to use the library in your code will take less time
than writing a robust parser from scratch. Yes, you can hand-hack a
minimal, non-validating, less-than-totally-robust XML parser in an
afternoon (I've done it), but you'll be tweaking that sucker
*constantly* (which I did as well).
|
|
0
|
|
|
|
Reply
|
John
|
3/23/2011 9:57:15 PM
|
|
In article
<05d2e0d8-44de-440c-b862-7e267a920dd9@r4g2000vbq.googlegroups.com>,
David Resnick <lndresnick@gmail.com> wrote:
> On Mar 22, 4:13 pm, Maxx <grungeddd.m...@gmail.com> wrote:
> > On Mar 21, 8:41 pm, Nobody <nob...@nowhere.com> wrote:
> >
> >
> >
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input and
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimensional
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is there any
> > > > better way to store the xml input input..
> >
> > > Yes. In fact, it would be hard to imagine a worse way.
> >
> > > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.
> >
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application can
> > > then query. SAX generates events (reported via callbacks) as it parses the
> > > file; it's up to the application to actually store the data.
> >
> > > Which flavour to use and exactly how to do it depend upon the details of
> > > the application.
> >
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...
>
> Note that it always starts this way. It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? But then
> there are modifications/extensions/etc. People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. People suddenly want validation. etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...
XML is the same as csh. Every time somebody raises a
problem with XML somebody else steps in and presents an
easy workaround. Eventually you are told not even to
try writing a parser. It is the death of a thousand
cuts. And for what?
XML gives PHBs the illusion that they know about
programming; and adventurers a cozy berth. XML is a scam.
Has XML gotten to the point a universal Turing machine
could be written in XML, or is it still singing "Daisy"?
--
Michael Press
|
|
0
|
|
|
|
Reply
|
Michael
|
3/24/2011 8:45:25 AM
|
|
On Mar 24, 4:45=A0am, Michael Press <rub...@pacbell.net> wrote:
> In article
> <05d2e0d8-44de-440c-b862-7e267a920...@r4g2000vbq.googlegroups.com>,
> =A0David Resnick <lndresn...@gmail.com> wrote:
>
>
>
> > On Mar 22, 4:13=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
> > > On Mar 21, 8:41=A0pm, Nobody <nob...@nowhere.com> wrote:
>
> > > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > > I'm writing a C program which would parse a xml file as its input=
and
> > > > > perform specific operations...
> > > > > Now what i have in my mind is that i should declare a two dimensi=
onal
> > > > > array and store the xml file in it
> > > > > My question is... is there any better way to do this, i.e. is the=
re any
> > > > > better way to store the xml input input..
>
> > > > Yes. In fact, it would be hard to imagine a worse way.
>
> > > > First, I wouldn't recommend trying to actually parse the XML yourse=
lf, as
> > > > you're practically bound to get it wrong. Use an XML parsing librar=
y
> > > > instead.
>
> > > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > > constructs a parse tree for the entire file, which the application =
can
> > > > then query. SAX generates events (reported via callbacks) as it par=
ses the
> > > > file; it's up to the application to actually store the data.
>
> > > > Which flavour to use and exactly how to do it depend upon the detai=
ls of
> > > > the application.
>
> > > Actually the xml file that i was going to provide the program will
> > > always have a predefined format, like the one example i gave above.It
> > > will always parse the same format and simply extract the values from
> > > the fields and write another xml file having the same template... so =
i
> > > was looking for the easiest way to solve it, instead of requiring to
> > > call extensive library functions...
>
> > Note that it always starts this way. =A0It is easy to hand parse the XM=
L
> > if it is in a truly fixed format, so why use a real parser? =A0But then
> > there are modifications/extensions/etc. =A0People hand edit the file an=
d
> > add white space, which won't confuse a parser but messes up your less
> > flexible hand parse. =A0People write a mixture of <element></element>
> > instead of <element/>, which should parse as equivalent and somehow
> > don't when hand parsing. =A0People suddenly want validation. =A0etc.
> > Going with a real parser is very much the way to go in a real
> > application, much more future friendly even if not apparently needed
> > up front...
>
> XML is the same as csh. Every time somebody raises a
> problem with XML somebody else steps in and presents an
> easy workaround. Eventually you are told not even to
> try writing a parser. It is the death of a thousand
> cuts. And for what?
>
> XML gives PHBs the illusion that they know about
> programming; and adventurers a cozy berth. XML is a scam.
>
> Has XML gotten to the point a universal Turing machine
> could be written in XML, or is it still singing "Daisy"?
>
XML is great in its place. Not a PHB, and don't believe
it to be a scam. I love it for flatfiles that need
structured information and flexibility. Easy to extend,
easy (with XPATH queries say) to get stuff out of.
Standard, everyone knows what it means, how to add
to it, how to parse and validate it. Does it solve
all problems in the world? Of course not...
-David
|
|
0
|
|
|
|
Reply
|
David
|
3/24/2011 12:38:19 PM
|
|
On Wed, 23 Mar 2011 11:45:47 -0700, David Resnick wrote:
> Note that it always starts this way. It is easy to hand parse the XML
> if it is in a truly fixed format,
If you restrict the application to reading a subset of XML, that defeats
the purpose of using XML in the first place.
You can find a wide range of tools which can process XML, but the range of
tools which can process a particular custom subset of XML is likely to be
much smaller (i.e. those tools which you write yourself).
If you think that you only need to support files written by a particular
program, you're likely to end up only supporting files which were directly
written by that program and not post-processed in any way. This often
makes your program less useful than you had originally assumed.
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/25/2011 7:40:31 AM
|
|
On Mar 24, 12:57=A0am, John Bode <jfbode1...@gmail.com> wrote:
> On Mar 23, 1:45=A0pm, David Resnick <lndresn...@gmail.com> wrote:
>
> Figuring out how to use the library in your code will take less time
> than writing a robust parser from scratch. =A0Yes, you can hand-hack a
> minimal, non-validating, less-than-totally-robust XML parser in an
> afternoon (I've done it), but you'll be tweaking that sucker
> *constantly* (which I did as well).
>
The problem is that it becomes harder to distribute the program. Even
if you have source to the library, it's often in messy files that are
hard to integrate and distract the reader from the actual logical core
of the program.
|
|
0
|
|
|
|
Reply
|
Malcolm
|
3/25/2011 8:19:26 AM
|
|
On Mar 25, 3:40=A0am, Nobody <nob...@nowhere.com> wrote:
> On Wed, 23 Mar 2011 11:45:47 -0700, David Resnick wrote:
> > Note that it always starts this way. =A0It is easy to hand parse the XM=
L
> > if it is in a truly fixed format,
>
> If you restrict the application to reading a subset of XML, that defeats
> the purpose of using XML in the first place.
>
> You can find a wide range of tools which can process XML, but the range o=
f
> tools which can process a particular custom subset of XML is likely to be
> much smaller (i.e. those tools which you write yourself).
>
> If you think that you only need to support files written by a particular
> program, you're likely to end up only supporting files which were directl=
y
> written by that program and not post-processed in any way. This often
> makes your program less useful than you had originally assumed.
Holy out of context quotes, Batman. Your reply misses the entire
point
of mine, which is that hand parsing is a bad idea. Did you read the
rest of the post or just answer after the first 2 lines?
-David
|
|
0
|
|
|
|
Reply
|
David
|
3/25/2011 11:31:43 AM
|
|
On Fri, 25 Mar 2011 04:31:43 -0700, David Resnick wrote:
> Holy out of context quotes, Batman. Your reply misses the entire
> point of mine, which is that hand parsing is a bad idea. Did you read the
> rest of the post or just answer after the first 2 lines?
I wasn't "replying" to your comments. I elaborated on your reply,
providing more reasons why it's a bad idea to assume that you only need
to handle a subset.
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/25/2011 6:15:07 PM
|
|
On Mar 25, 2:15=A0pm, Nobody <nob...@nowhere.com> wrote:
> On Fri, 25 Mar 2011 04:31:43 -0700, David Resnick wrote:
> > Holy out of context quotes, Batman. =A0Your reply misses the entire
> > point of mine, which is that hand parsing is a bad idea. =A0Did you rea=
d the
> > rest of the post or just answer after the first 2 lines?
>
> I wasn't "replying" to your comments. I elaborated on your reply,
> providing more reasons why it's a bad idea to assume that you only need
> to handle a subset.
Just seemed to be replying to my comments, as that was the only quoted
text being addressed. My mistake.
-David
|
|
0
|
|
|
|
Reply
|
David
|
3/25/2011 6:26:50 PM
|
|
On Mar 23, 11:45=A0am, David Resnick <lndresn...@gmail.com> wrote:
> On Mar 22, 4:13=A0pm, Maxx <grungeddd.m...@gmail.com> wrote:
>
>
>
> > On Mar 21, 8:41=A0pm, Nobody <nob...@nowhere.com> wrote:
>
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input a=
nd
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimension=
al
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is there=
any
> > > > better way to store the xml input input..
>
> > > Yes. In fact, it would be hard to imagine a worse way.
>
> > > First, I wouldn't recommend trying to actually parse the XML yourself=
, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.
>
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application ca=
n
> > > then query. SAX generates events (reported via callbacks) as it parse=
s the
> > > file; it's up to the application to actually store the data.
>
> > > Which flavour to use and exactly how to do it depend upon the details=
of
> > > the application.
>
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...
>
> Note that it always starts this way. =A0It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? =A0But then
> there are modifications/extensions/etc. =A0People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. =A0People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. =A0People suddenly want validation. =A0etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...
I'm using the parser so that i can extract the necessary values from
specific fields...Anyways i have decided to go with a real parser as
its becoming too cumbersome.
Thanks
|
|
0
|
|
|
|
Reply
|
Maxx
|
3/25/2011 6:38:18 PM
|
|
John Bode wrote:
> Not to mention it's code that *you* don't have to write or test.
Not necessarily. Depending on what the programmer intends to do, when he
adopts a 3rd party parser instead of writing his own what he is doing is
delegating only a portion of the work he must do in order to extract
information from a given format, while being forced to do the legwork in
the remaining of the work.
More specifically, when a programmer employs a 3rd party parser, he is
implicitly dividing the simple task of parsing a given format into two
different tasks:
- parsing the information described in a base format in order to build a
data structure
- parsing the data structure in order to extract the information he
intended to extract
While the first test may be delegated to a parser developed by a 3rd
party, which ends up being implemented by a small generic code snippet,
the second task ends up being needlessly cumbersome, error-prone and
needlessly wasting resources which, in some cases, the programmer may not
have. Yet, it still needs code which *you* have to write and, more
importantly, *you* must test, with the added difficulty of consisting of a
couple of layers of abstraction.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/25/2011 8:06:25 PM
|
|
Nobody wrote:
> If you restrict the application to reading a subset of XML, that defeats
> the purpose of using XML in the first place.
Every XML application a language which is a subset of XML. Every
application of XML is nothing more than the definition of languages which
are a subset of XML. The main advantages of XML is that it's human-
readable, the languages based on it tend to be self-descriptive and it's a
common base format of a series of languages. This means that it becomes
easier to add support for other languages, even if you don't have the
entire specification.
Therefore, claiming that restricting the application to reading a subset
of XML defeats the purpose of adopting a XML-based language doesn't make
sense. It doesn't make sense because the only purpose of XML is to reduce
it to a subset.
> You can find a wide range of tools which can process XML, but the range
> of tools which can process a particular custom subset of XML is likely
> to be much smaller (i.e. those tools which you write yourself).
An image editor is a tool that can only process a particular custom subset
of XML (for example, SVG). The same applies to office applications, RSS
readers, web browsers and other applications. Therefore, there is no harm
in that. That's what programs are designed to do.
> If you think that you only need to support files written by a particular
> program, you're likely to end up only supporting files which were
> directly written by that program and not post-processed in any way. This
> often makes your program less useful than you had originally assumed.
That problem has absolutely nothing to do with XML and everything to do
with adopting/creating open standards to exchange information.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/25/2011 8:33:32 PM
|
|
David Resnick wrote:
> Note that it always starts this way. It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? But then
> there are modifications/extensions/etc. People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse.
Adding white spaces can only mess up a parser if the parser wasn't develop
to handle that language. Therefore, you can't claim that writing parsers
by hand is a bad thing to do if the only problem that you can point out is
that your parser fails to parse the language it was intended to parse.
> People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing.
Only if you failed to add support for that in your parser.
> People suddenly want validation. etc.
The beautiful thing about parsers is that they automatically and
implicitly validate a given language. Therefore, it's a non-issue.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...
This idea that a parser developed by a programmer is somehow not a "real
parser" is silly. Either you mispoke or you don't know what you are
talking about.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/25/2011 8:41:48 PM
|
|
Malcolm McLean wrote:
> Think of the XML as a tree, and build what is known as a recursive
> descent parser.
>
> Basically it's the same problem as a mathematical expression with
> deeply nested parentheses, in a slightly different form. You need one
> token of lookahead.
>
> Once you've converted the XML to a tree, you'll usually want to walk
> the tree to convert to a set of nested arrays, but sometimes it will
> be better to keep the data in tree form.
If someone goes through the trouble of writing a dedicated parser for a
particular language then there is no need to parse it to an intermediate
form. That just forces the need to parse essentially the same information
twice just to be able to access that information. Just parse the document
and handle the information in an appropriate way once it is parsed.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/25/2011 8:50:17 PM
|
|
Nobody wrote:
> I wasn't "replying" to your comments. I elaborated on your reply,
> providing more reasons why it's a bad idea to assume that you only need
> to handle a subset.
Let's say that we developed a new XML-based language intended to replace
all documents encoded in the INI document format. The language would be
something like:
<?xml version="1.0" encoding="UTF-8" ?>
<document version="1.0">
<section>
<name> section name </name>
<entry>
<label> label name </label> <value> this label's value</value>
</entry>
...
</section>
...
</document>
In this XML-based language, the only accepted element name for the root
element is the string "document". The root element must have an attribute
to declare the format's version number and may have zero or more "section"
elements. Each "section" element must have a "name" element, followed by
zero or more "entry" elements. Each "entry" element consists of a "label"
element followed by a "value" element, whose content can only be character
data. Every other XML construct is either ignored or declared as an
error.
Considering this, why do you believe it is a bad idea to write a parser
that only accepts this subset of XML?
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/25/2011 9:15:59 PM
|
|
On Fri, 25 Mar 2011 21:15:59 +0000, Rui Maciel wrote:
>> I wasn't "replying" to your comments. I elaborated on your reply,
>> providing more reasons why it's a bad idea to assume that you only need
>> to handle a subset.
>
> Let's say that we developed a new XML-based language intended to replace
> all documents encoded in the INI document format. The language would be
> something like:
[snip]
> In this XML-based language, the only accepted element name for the root
> element is the string "document". The root element must have an
> attribute to declare the format's version number and may have zero or
> more "section" elements. Each "section" element must have a "name"
> element, followed by zero or more "entry" elements. Each "entry"
> element consists of a "label" element followed by a "value" element,
> whose content can only be character data. Every other XML construct is
> either ignored or declared as an error.
>
> Considering this, why do you believe it is a bad idea to write a parser
> that only accepts this subset of XML?
That isn't what we're talking about. Any validating parser rejects
invalid documents; that doesn't mean that such a parser only accepts a
subset of the language.
A subset of the /language/ implies that, for any given data, only a subset
of the valid representations are accepted, e.g. requiring <tag></tag>
rather than <tag/>, imposing constraints upon whitespace within
tags, requiring attributes to be specified in a particular order, etc.
Having said that, the main reasons why writing such a parser would be a
bad idea are:
1. It doesn't help. The parser wouldn't be significantly simpler than one
which parsed arbitrary XML; in fact, it would probably be more
complicated, as the parser would be performing checks which most
(non-validating) parsers leave to the application.
2. If you want to extend the format, you have to change the code for the
parser. With a generic non-validing parser, you don't have to change
anything; with a generaic validating parser, you only have to change
the DTD. In either case, the application would only need to be changed if
it didn't just ignore unrecognised elements.
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/26/2011 9:12:53 AM
|
|
On Fri, 25 Mar 2011 20:41:48 +0000, Rui Maciel wrote:
> Adding white spaces can only mess up a parser if the parser wasn't develop
> to handle that language. Therefore, you can't claim that writing parsers
> by hand is a bad thing to do if the only problem that you can point out is
> that your parser fails to parse the language it was intended to parse.
Right. Which is exactly what we mean by a "subset" of XML.
I'm not sure whether you're playing devil's advocate or you actually
aren't aware of just how common a problem this is. I've lost track of the
number of times I've seen stuff like "sed 's!<title>\(.*\)</title>!\1!' ...".
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/26/2011 9:24:50 AM
|
|
Nobody wrote:
> On Fri, 25 Mar 2011 20:41:48 +0000, Rui Maciel wrote:
>
>> Adding white spaces can only mess up a parser if the parser wasn't
>> develop
>> to handle that language. Therefore, you can't claim that writing
>> parsers by hand is a bad thing to do if the only problem that you can
>> point out is that your parser fails to parse the language it was
>> intended to parse.
>
> Right. Which is exactly what we mean by a "subset" of XML.
This particular issue has nothing to do with a language being or not being
a subset of XML. It's a problem caused by adopting a poorly thought out
language which fails to cover the intended use case.
> I'm not sure whether you're playing devil's advocate or you actually
> aren't aware of just how common a problem this is. I've lost track of
> the number of times I've seen stuff like "sed
> 's!<title>\(.*\)</title>!\1!' ...".
I've written a few parsers, including a couple of generic parsers for a
markup language, and supporting white spaces between elements (or any
equivalent nesting construct) is one of the most trivial things that one
can add to a parsers, particularly because it either represents a single
terminal in the production or it doesn't even need to be supported in the
language's grammar.
In the case of XML, as an element may have character data between the
element's start tag and end tag, then it would probably be better to add
support for it in the production and, depending on how the language was
designed, ignore it or throw some kind of error.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/26/2011 11:26:40 AM
|
|
Nobody wrote:
> That isn't what we're talking about. Any validating parser rejects
> invalid documents; that doesn't mean that such a parser only accepts a
> subset of the language.
>
> A subset of the /language/ implies that, for any given data, only a
> subset of the valid representations are accepted, e.g. requiring
> <tag></tag> rather than <tag/>, imposing constraints upon whitespace
> within tags, requiring attributes to be specified in a particular order,
> etc.
A subset of a language is still a language on it's own, which means that a
parser designed to handle it either accepts a document as valid or rejects
it.
Knowing this, a subset of XML will only impose the constraints which it
was designed to impose; no more, no less. If you write a parser that
rejects certain language constructs then you either failed to design your
language or you failed to write your parser. Your failure to do any of
these things does not mean that it is a bad idea to develop parsers. It
only means that you failed to develop the language and/or parser that you
needed.
> Having said that, the main reasons why writing such a parser would be a
> bad idea are:
>
> 1. It doesn't help. The parser wouldn't be significantly simpler than
> one which parsed arbitrary XML; in fact, it would probably be more
> complicated, as the parser would be performing checks which most
> (non-validating) parsers leave to the application.
If you keep in mind that a generic parser only manages to transform the
information between two formats (i.e., parse a document and build up a
data structure) and that you are still forced to parse the end-format to
validate your format and extract the information (i.e., traverse the data
structure, perform sanity checks according to the information found on the
data structure, extract information, etc...).
This means that once you adopt a generic parser to parse a document then,
unless you intend to parse a home-brew format that will not be exchanged
by anyone and will only be used by a specific version of a specific
program, you are only fooling yourself to believe that you are simplifying
things. You aren't. You are adding a new abstraction layer to your
program that does nothing more than convert the information between
formats, both of which you still have to parse.
> 2. If you want to extend the format, you have to change the code for the
> parser. With a generic non-validating parser, you don't have to change
> anything; with a generaic validating parser, you only have to change
> the DTD. In either case, the application would only need to be changed
> if it didn't just ignore unrecognised elements.
Not quite. The DTD only helps you set the generic parser to perform a set
of sanity checks. Meanwhile you are still forced to rely on two separate
parsers to parse a single piece of information.
Adding to this, relying on generic parsers and DTDs won't help you with
basic tasks such as adding support for multiple versions of the same
language. That means that if you rely on a generic parser and you are
suddenly forced to tweak your document format and therefore support
multiple versions of the same format then you are either screwed or you
are forced to employ a scheme to convert all instances of a format into
the new format, something which in some cases it's impossible.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/26/2011 11:57:03 AM
|
|
On 03/27/11 12:57 AM, Rui Maciel wrote:
> Nobody wrote:
>
>> That isn't what we're talking about. Any validating parser rejects
>> invalid documents; that doesn't mean that such a parser only accepts a
>> subset of the language.
>>
>> A subset of the /language/ implies that, for any given data, only a
>> subset of the valid representations are accepted, e.g. requiring
>> <tag></tag> rather than<tag/>, imposing constraints upon whitespace
>> within tags, requiring attributes to be specified in a particular order,
>> etc.
>
> A subset of a language is still a language on it's own, which means that a
> parser designed to handle it either accepts a document as valid or rejects
> it.
>
> Knowing this, a subset of XML will only impose the constraints which it
> was designed to impose; no more, no less. If you write a parser that
> rejects certain language constructs then you either failed to design your
> language or you failed to write your parser. Your failure to do any of
> these things does not mean that it is a bad idea to develop parsers. It
> only means that you failed to develop the language and/or parser that you
> needed.
The problem of what to do with the data in an XML document (or any other
structured document) is one of the reasons why there are two types of
XML parser. One can either use a SAX (stream) parser to process
elements as they are encountered, or parse the complete document into a
DOM (Document Object Model) tree.
I use both, depending on the problem at hand. If the data has to be
manipulated as a complete set, I use my (heavy) DOM parser. If not
(loading a configuration for example), I use my light SAX parser.
A SAX parser uses callback functions to handle various events triggered
by the document, which makes it easy to translate elements of interest
into application data structures or actions, which would be ideal for
the OP's requirement.
--
Ian Collins
|
|
0
|
|
|
|
Reply
|
Ian
|
3/26/2011 10:58:26 PM
|
|
Ian Collins wrote:
> The problem of what to do with the data in an XML document (or any other
> structured document) is one of the reasons why there are two types of
> XML parser. One can either use a SAX (stream) parser to process
> elements as they are encountered, or parse the complete document into a
> DOM (Document Object Model) tree.
>
> I use both, depending on the problem at hand. If the data has to be
> manipulated as a complete set, I use my (heavy) DOM parser. If not
> (loading a configuration for example), I use my light SAX parser.
>
> A SAX parser uses callback functions to handle various events triggered
> by the document, which makes it easy to translate elements of interest
> into application data structures or actions, which would be ideal for
> the OP's requirement.
The SAX approach is basically a partially developed parser. In essence, a
SAX API provides a stream of terminal tokens while performing sanity
checks on the base format. To put it in other words, a SAX parser is
basically a lexer that converts a set of terminal tokens from a base
language (say, XML) to a single terminal token from a different language
(say, SVG). In this process, it also implicitly performs a set of sanity
checks on the base language.
This means that when a programmer opts to parse a given document following
the SAX approach, what he is doing is essentially picking up a specialized
lexer and writing his own parser around that particular lexer. So, this
means that although the programmer avoids parsing a much larger language
(i.e., what the SAX lexer returns as "open element A" may be "terminal
token '<' followed by terminal token text string, with the string 'A',
followed by token '>') he still has to set a production for his language
and develop a parser to parse his language.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
Rui
|
3/27/2011 12:58:44 PM
|
|
On 03/28/11 01:58 AM, Rui Maciel wrote:
> Ian Collins wrote:
>
>> The problem of what to do with the data in an XML document (or any other
>> structured document) is one of the reasons why there are two types of
>> XML parser. One can either use a SAX (stream) parser to process
>> elements as they are encountered, or parse the complete document into a
>> DOM (Document Object Model) tree.
>>
>> I use both, depending on the problem at hand. If the data has to be
>> manipulated as a complete set, I use my (heavy) DOM parser. If not
>> (loading a configuration for example), I use my light SAX parser.
>>
>> A SAX parser uses callback functions to handle various events triggered
>> by the document, which makes it easy to translate elements of interest
>> into application data structures or actions, which would be ideal for
>> the OP's requirement.
>
> The SAX approach is basically a partially developed parser. In essence, a
> SAX API provides a stream of terminal tokens while performing sanity
> checks on the base format. To put it in other words, a SAX parser is
> basically a lexer that converts a set of terminal tokens from a base
> language (say, XML) to a single terminal token from a different language
> (say, SVG). In this process, it also implicitly performs a set of sanity
> checks on the base language.
>
> This means that when a programmer opts to parse a given document following
> the SAX approach, what he is doing is essentially picking up a specialized
> lexer and writing his own parser around that particular lexer. So, this
> means that although the programmer avoids parsing a much larger language
> (i.e., what the SAX lexer returns as "open element A" may be "terminal
> token '<' followed by terminal token text string, with the string 'A',
> followed by token '>') he still has to set a production for his language
> and develop a parser to parse his language.
Which he will end up doing no matter what approach is used to parse the
source document.
--
Ian Collins
|
|
0
|
|
|
|
Reply
|
Ian
|
3/27/2011 7:44:20 PM
|
|
On Sat, 26 Mar 2011 11:26:40 +0000, Rui Maciel wrote:
>> I'm not sure whether you're playing devil's advocate or you actually
>> aren't aware of just how common a problem this is. I've lost track of
>> the number of times I've seen stuff like "sed
>> 's!<title>\(.*\)</title>!\1!' ...".
>
> I've written a few parsers, including a couple of generic parsers for a
> markup language, and supporting white spaces between elements (or any
> equivalent nesting construct) is one of the most trivial things that one
> can add to a parsers,
Dealing with whitespace may be trivial (unless the underlying I/O code is
line-oriented, as XML allows linefeeds within tags), but it's frequently
omitted.
It's less trivial to deal with the fact that attributes may appear in any
order.
|
|
0
|
|
|
|
Reply
|
Nobody
|
3/28/2011 4:13:15 PM
|
|
Nobody wrote:
> Dealing with whitespace may be trivial (unless the underlying I/O code
> is line-oriented, as XML allows linefeeds within tags), but it's
> frequently omitted.
The implementation details of the IO part of a parser are irrelevant.
Whether the IO is line-oriented or not, the IO code should never insert or
ommit information, which means that a parser only handles the information
provided by a stream.
> It's less trivial to deal with the fact that attributes may appear in
> any order.
I don't believe that constitutes a real problem. For example, consider a
XML-based file format which consists of a single element "element" which
may have a set of attributes labelled "alpha", "beta" an "gamma". For
that language, a valid document could be something like:
<element alpha="true" />
If the language accepts repeated attributes then a possible (and crude)
production[1] would be something like:
<example>
document = "<" "element" *<tags> "/" ">"
tag> = "alpha" "=" text string
= "beta" "=" text string
= "gamma" "=" text_string
</example>
The support for the tags specified in the above production in a LL parser,
ignoring error handling, may be around 3 states (6, if we count a "ghost"
state to push the attribute values into a data structure).
If, instead, the attributes must follow a specific order (alpha, beta,
gamma) where:
- each attribute can either be present or not
- an attribute appearing out of it's rightful place is considered an error
then, the following production applies:
<example>
document = "<" "element" *1alpha_tag *1beta_tag *1gamma_tag "/" ">"
alpha_tag = "alpha" "=" text string
beta_tag = "beta" "=" text string
gamma_tag = "gamma" "=" text_string
</example>
The support for the tags specified in the above production in a LL parser,
ignoring error handling, is yet again achieved by adding 3 states (6, with
the "ghost" states).
If your language accepts any possible attribute combination then the
production starts to become a bit more demanding. Yet, you only need to
deal with this if you specifically wish that your grammar accepts your
attributes in any random order, which means that you are creating your own
problem.
Nonetheless, notice that you will be faced with the exact same problem if
you wish to rely on a generic parser instead of one which you develop
yourself. In that case, you will be faced with a more demanding problem,
as you are forced to deal with nodes in a tree structure instead of a
simple stream of terminal tokens.
Rui Maciel
[1] http://tools.ietf.org/html/rfc5234
|
|
0
|
|
|
|
Reply
|
rui.maciel (1746)
|
3/29/2011 12:04:54 AM
|
|
Ian Collins wrote:
> Which he will end up doing no matter what approach is used to parse the
> source document.
If a programmer opts for a DOM-type approach then he will be faced with a
problem which is considerably (and needlessly) more complicated.
But considering that the programmer opts for a SAX-type approach, and
knowing that the only thing that he gets is a tricked-out lexer and that
he is still forced to develop his own parser, by adopting a XML library
which provides SAX the programmer is essentially being forced to adopt a
particular language which more often than not does not even fit the
intended purpose.
So, if a generic XML API doesn't eliminate the need to develop a parser to
extract information then what's the point of adopting a generic parser to
begin with, let alone base their document format on XML?
Rui Maciel
|
|
0
|
|
|
|
Reply
|
rui.maciel (1746)
|
3/29/2011 12:15:09 AM
|
|
On 03/29/11 01:15 PM, Rui Maciel wrote:
> Ian Collins wrote:
>
>> Which he will end up doing no matter what approach is used to parse the
>> source document.
>
> If a programmer opts for a DOM-type approach then he will be faced with a
> problem which is considerably (and needlessly) more complicated.
>
> But considering that the programmer opts for a SAX-type approach, and
> knowing that the only thing that he gets is a tricked-out lexer and that
> he is still forced to develop his own parser, by adopting a XML library
> which provides SAX the programmer is essentially being forced to adopt a
> particular language which more often than not does not even fit the
> intended purpose.
>
> So, if a generic XML API doesn't eliminate the need to develop a parser to
> extract information then what's the point of adopting a generic parser to
> begin with, let alone base their document format on XML?
Indeed, that's one reason I prefer JSON.
But the choice of representation isn't always one the developer can
make. I have written a lot of code (in a variety of languages) to
extract data from OpenOffice documents. The client does not care that I
have to work with an XML document, they just want the data from the
document.
--
Ian Collins
|
|
0
|
|
|
|
Reply
|
ian-news (9881)
|
3/29/2011 12:34:38 AM
|
|
On Tue, 29 Mar 2011 01:15:09 +0100, Rui Maciel wrote:
> But considering that the programmer opts for a SAX-type approach, and
> knowing that the only thing that he gets is a tricked-out lexer and that
> he is still forced to develop his own parser,
You make it sound as if it's a significant issue. Once you have the lexer,
XML is trivial to parse. There are no shift-reduce or reduce-reduce
conflicts, because every construct begins with a token which is unique to
that construct.
> So, if a generic XML API doesn't eliminate the need to develop a parser
> to extract information then what's the point of adopting a generic
> parser to begin with, let alone base their document format on XML?
The point is that you don't have to code dedicated utilities for common
tasks, as you can just use xslt, xquery, etc. You don't have to write
bindings for a variety of languages, as every common language already has
XML parsers (and more, e.g. tools which will generate class definitions
from a DTD or vice-versa).
In many cases, the only valid reason for /not/ using XML is efficiency (I
don't consider the vendor lock-in which proprietary formats offer to be a
"valid" reason).
|
|
0
|
|
|
|
Reply
|
nobody (4805)
|
3/29/2011 1:14:28 AM
|
|
On Tue, 29 Mar 2011 01:04:54 +0100, Rui Maciel wrote:
>> Dealing with whitespace may be trivial (unless the underlying I/O code
>> is line-oriented, as XML allows linefeeds within tags), but it's
>> frequently omitted.
>
> The implementation details of the IO part of a parser are irrelevant.
Not if it constrains the data flow, i.e. when you don't get to carry
state over between lines, i.e. what happens when people try to parse XML
with grep/sed/perl/etc.
|
|
0
|
|
|
|
Reply
|
nobody (4805)
|
3/29/2011 1:27:02 AM
|
|
Nobody wrote:
> Not if it constrains the data flow, i.e. when you don't get to carry
> state over between lines, i.e. what happens when people try to parse XML
> with grep/sed/perl/etc.
If people rely on grep to parse XML then they are intentionally creating
their own problems. No one decides to open a ditch with a screwdriver and
complains that the job is simply too complicated to perform.
The same applies to Perl if people try to employ it to parse XML as in the
grep case. This would, obviously, be stupid as it is quite possible to
write parsers in Perl.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
rui.maciel (1746)
|
3/29/2011 10:28:36 AM
|
|
On Mar 28, 8:15=A0pm, Rui Maciel <rui.mac...@gmail.com> wrote:
> Ian Collins wrote:
> > Which he will end up doing no matter what approach is used to parse the
> > source document.
>
> If a programmer opts for a DOM-type approach then he will be faced with a
> problem which is considerably (and needlessly) more complicated.
>
> But considering that the programmer opts for a SAX-type approach, and
> knowing that the only thing that he gets is a tricked-out lexer and that
> he is still forced to develop his own parser, by adopting a XML library
> which provides SAX the programmer is essentially being forced to adopt a
> particular language which more often than not does not even fit the
> intended purpose. =A0
>
> So, if a generic XML API doesn't eliminate the need to develop a parser t=
o
> extract information then what's the point of adopting a generic parser to
> begin with, let alone base their document format on XML?
>
> Rui Maciel
You can use a DOM parser and a query language like XPATH, makes
getting information pretty simple. Parse, ask for what you need.
Of course, not appropriate for all uses, but nice for getting
what you want out of the doc.
-David
|
|
0
|
|
|
|
Reply
|
lndresnick (326)
|
3/29/2011 11:58:43 AM
|
|
Nobody wrote:
> On Tue, 29 Mar 2011 01:15:09 +0100, Rui Maciel wrote:
>
>> But considering that the programmer opts for a SAX-type approach, and
>> knowing that the only thing that he gets is a tricked-out lexer and
>> that he is still forced to develop his own parser,
>
> You make it sound as if it's a significant issue. Once you have the
> lexer, XML is trivial to parse. There are no shift-reduce or
> reduce-reduce conflicts, because every construct begins with a token
> which is unique to that construct.
Writing your own parser is not a significant issue. That's why people opt
for the SAX approach. And you only stumble on shift-reduce/reduce-reduce
conflicts if you are trying to develop a parser for a language which
suffers from ambiguity issues, which doesn't really apply to XML or any
language based on XML.
>> So, if a generic XML API doesn't eliminate the need to develop a parser
>> to extract information then what's the point of adopting a generic
>> parser to begin with, let alone base their document format on XML?
>
> The point is that you don't have to code dedicated utilities for common
> tasks, as you can just use xslt, xquery, etc.
The point is that if a programmer tries to avoid developing a parser for
his language because he believes it takes too much work, adopting layers
of 3rd party libraries won't save him any work in the end, nor will it
make his life any easier.
Probably the only benefit a programmer gets from insisting in using those
3rd party libraries is that he can pad his resume with lots of buzzwords,
although in the end the only thing they demonstrate is that that
programmer invests his time implementing bloated tools and forcing the
wrong solutions onto jobs which otherwise would be considerably simpler
and more efficient.
> You don't have to write
> bindings for a variety of languages
It's irrelevant. Once you know how to develop a parser in a given
language you are able to develop a parser in any language you know.
> , as every common language already
> has XML parsers (and more, e.g. tools which will generate class
> definitions from a DTD or vice-versa).
As I've stated before, adopting a 3rd party library that handles XML
doesn't mean you avoided the need to develop your parser. You are still
forced to develop a parser, whether to parse a tree structure which is
assembled by the 3rd party library or to implement a working parser from
the glorified lexer which has been provided.
Adding to this, when someone mindlessly adopts a 3rd party library to
process XML documents and does so not because he believes XML is the right
tool for the job but simply because the 3rd party library is there and he
doesn't know any better, that person tends to be forced to shoe-horn XML
into an application which it isn't suited. This is one of the reasons we
tend to see XML being forced into uses that clearly it isn't the best tool
for the job. Or even adequate. It's one of those examples of "if all you
have is a hammer, everything looks like a nail".
> In many cases, the only valid reason for /not/ using XML is efficiency
> (I don't consider the vendor lock-in which proprietary formats offer to
> be a "valid" reason).
There are plenty of reasons why XML is not the right tool for the job, and
thanks to the "but there is an API for that" mentality, there are plenty
of examples that demonstrate how XML is being forced into jobs it isn't
fit to do. For example, it doesn't make any sense to rely on XML to
encode any data structure beyond trees. And some people insist on
pounding the XML hammer on that nail.
Rui Maciel
|
|
0
|
|
|
|
Reply
|
rui.maciel (1746)
|
3/29/2011 5:18:14 PM
|
|
|
40 Replies
220 Views
(page loaded in 0.405 seconds)
Similiar Articles: need help parsing PDF documents - comp.text.pdfhow do I convert a binary stream to a base64 stream in a pdf file ... Need help to decode snmp string - comp.lang.ruby need help parsing PDF documents - comp.text.pdf Need ... parsing a text file - comp.soft-sys.matlabneed help parsing PDF documents - comp.text.pdf Recognize text from certain position ... comp.unix.shell ... parsing a text file - comp.soft-sys.matlab parsing XML file with ... parsing alpha and numeric characters out of string - comp.lang.awk ...need help parsing PDF documents - comp.text.pdf I tried to find out which charcode has ... ... Parsing file names with spaces - comp.lang.perl.misc... it isn't hard to write one ... parsing XML file with sed - comp.unix.shellI need to read an XML file and for every occurance of <name ... text file - comp.soft-sys.matlab parsing XML file with ... my other friend – SED, which will help us ... Base64 encode in VB6, decode in Java PROBLEM!!! - comp.lang.java ...Base64 encode in VB6, decode in Java PROBLEM!!! - comp.lang.java ... how do I convert a binary stream to a base64 stream in a pdf file ... need help parsing PDF documents ... Is there a Windows tool to analyze corrupted PDF files? - comp ...need help with corrupt MAIL.MAI file - comp.os.vms Windows XP file system corrupt when launching ... tool to analyze corrupted PDF files? - comp ... need help parsing PDF ... Parsing Email data into Filemaker - comp.databases.filemaker ...Greetings, I need some help. I've got name and address information coming from my website form via form mail into my apple client mail box. What ... how do I convert a binary stream to a base64 stream in a pdf file ...need help parsing PDF documents - comp.text.pdf how do I convert a binary stream to a base64 stream in a pdf file ... Need help to decode snmp string - comp.lang.ruby need ... Extract Text Coordinates from PDF - comp.text.pdf... extract > the starting (top left) coordinates (x,y) of each word in a PDF file ... text.pdf Programmatically check/extract PDF comments - comp.text.pdf ... need help parsing ... Indirect object referencing (PDF parsing) - comp.text.pdf ...parsing a text file - comp.soft-sys.matlab Indirect object referencing (PDF parsing ... PDF Object Error - comp.databases.filemaker need help parsing PDF documents - comp ... Simple parsing text , but not for newbie - comp.lang.awk ...need help parsing PDF documents - comp.text.pdf... are ASCII values, it *looks* like ... file. | DaniWeb Software Development > C++ > Newbie C++ Search and Parsing Text file. ... Error: Couldn't find unicodeMap file for the ____ encoding - comp ...need help parsing PDF documents - comp.text.pdf... file def /InFile (flate.ps) (r) file ... Well, i thought that MacRoman encoding has been used but i couldn't just ... Very fast delimited record parsing with boost - comp.lang.c++ ...Can anyone help with their ideas / experience. I am also very open to any ... 4 Gigs it's not going to fit into a 32-bit address space, so you'd need to parse the file in ... Stumped on FlateDecode - comp.text.pdf... which I can feed the result of ASCII85Decode (saved to an intermediate file ... very much. that had me really stumped. from early searches it seems ... need help parsing ... .ini file C/C++ lib needed - comp.unix.programmerHi, I'd need a C, preferably C++ library which can be used to parse windows-style .ini files. Does someone know one? Google, apt-cache and yast aren't much of a help. Need help in parsing XML file - LinuxQuestions.orgProgramming This forum is for all programming questions. The question does not have to be directly related to Linux and any language is fair game. reached end of file while parsing. need help | Java.netimport java.util.Scanner; public class SawTooth {public static void main(String[] args) {Scanner input = new Scanner(System.in); System.out.print("Number of teeth: "); 7/30/2012 9:58:24 AM
|