Source to Source compilation - targeting C?

  • Follow


Anyone have suggestions for a way to take a fairly simple "custom"
high level language (syntax and semantics) to target C as the output?
(doesn't have to be readable C)

By this I mean an existing backend that outputs C already exists for
the "tool". The "tool" must be relatively easy to use assuming
knowledge of BNF, grammars etc but not much in the way of code
generation knowledge.

 Does the ROSE compiler framework do this?
 http://www.rosecompiler.org/

thanks

0
Reply marktxx (13) 12/28/2009 3:13:24 PM

"Mark Txx" <marktxx@yahoo.com> wrote in message
> Anyone have suggestions for a way to take a fairly simple "custom"
> high level language (syntax and semantics) to target C as the output?
> (doesn't have to be readable C)
>
> By this I mean an existing backend that outputs C already exists for
> the "tool". The "tool" must be relatively easy to use assuming
> knowledge of BNF, grammars etc but not much in the way of code
> generation knowledge.
>
> Does the ROSE compiler framework do this?
> http://www.rosecompiler.org/

The ROSE compiler framework is relatively good at transforming C and
C++ into revised C and C++, and perhaps Fortran into revised Fortran.
This "limitation" occurs because it ROSE uses preexisting parsers (the
EDG front end for C and C++ and something else for FORTRAN) and
doesn't have a general mechanism for defining DSLs, as I understand
it.

The DMS Software Reengineering Toolkit is designed to be a source-to-source
transformation system for arbitrary langauges.  It has means for
defining your own arbitray syntax, automating the construction for parsers
for such, and applying transformations that map fragments of your syntax
into fragments of syntax of other langauges.  DMS can be obtained
with robust C front ends, so defining transformations that your DSL
into C fragments is generally straightforward.
See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
and http://www.semanticdesigns.com/Products/FrontEnds/CFrontEnd.html.

I don't know if DMS will satisfy your "relatively easy to use" criteria.
If nothing else, the C parser that DMS uses is pretty complex
because C itself is complicated and messy.

In actual practice, langauge translations can be pretty complex; after
all, you are trying to bridge a non-trivial semantic gap if your DSL
is interesting.  (If you aren't bridging a deep semantic gap, what's
the point of the DSL?)  This means you in general you need all the
machinery of complex translation engines (symbol table construction,
AST walking, control and data flow analysis, optimizing transforms,
etc.) and and understanding of how all of that is put together with a
particular infrastructure.  Learning the particular infrastructure
takes some time.  The point is that you can learn and use such
infrastructure in far less time (weeks to months) than it will take
you to build the pieces (years, IMHO) you need if you want to do
something really interesting.  DMS has been in constant engineering
for 15 years, and Rose I think for about 10.

DMS does provide a well-documented infrastructure with all of this (it
is a commercial product and we try to meet commercial expectations).
Rose does have online documentation and of what I can see it seems
fairly reasonable.  YMMV.

--
Ira Baxter, CTO
www.semanticdesigns.com

0
Reply Ira 12/30/2009 3:53:17 PM


"Mark Txx" <marktxx@yahoo.com> wrote in message
> Anyone have suggestions for a way to take a fairly simple "custom"
> high level language (syntax and semantics) to target C as the output?
> (doesn't have to be readable C)
>
> By this I mean an existing backend that outputs C already exists for
> the "tool". The "tool" must be relatively easy to use assuming
> knowledge of BNF, grammars etc but not much in the way of code
> generation knowledge.
>
> Does the ROSE compiler framework do this?
> http://www.rosecompiler.org/
>

as for rose, dunno, not looked into it.


can't say much about this as-is, but here is my thought:
in this case, it may just be better to write it yourself, since what is
being described here is, essentially, one of the simpler approaches to HLL
creation (I will claim a certain amount of experience here).


you don't really need "code generation knowledge" to target C, it is not
that complicated, and (vs ASM), C is a very forgiving target.

if one has a general grasp of C, this should be plenty WRT targetting it.
misc note:
this may be one of those rare cases where it may make sense to forget any
aversions to "goto". goto is very useful with emitting code, as then one can
"decompose" most constructs into simpler parts.


similarly, in contrast to how many people obsess over BNF and
parser-generator tools, IME, parsers are one of the simpler parts of a
compiler to write (or, at least once one gets past "trivial" stages).

for a very simple compiler or interpreter, a parser can be a bigger chunk,
but one will find if they go on to writing more advanced compilers (say, for
languages like C or Java), it is not the parser which is complicated
(rather, the "demons" like to hang out somewhere more around the register
allocator, low-level optimizer, and the ASM codegen...).

then again, I have usually always used hand-written recursive-descent, so
maybe parsers are complicated when using all these tools?... (ok, this is
partly satire...).


but, in this case, my thought is to try doing this one for oneself and maybe
learn something in the process.


now, as for a few "hints":
don't try to "parse" directly into your output, as this is awkward and
painful.
maybe take a brief look at LISP and Scheme, even if you don't intend to use
them as such, the languages have a general structure and facilities which
are very useful in compiler writing (even if the compiler itself is to be
written in C, similar facilities can be implemented in C as well).

the idea is that one can, in effect, parse into AST's in an S-Expression
like representation, and use this for basic high-level transformations and
for driving compiler logic.

usually, the idea is to construct ASTs which are a fairly direct
transcription of the input syntax (at the structural / semantic level, but
not necessarily at the lexical level).

"2*x+3" -> "(+ (* 2 x) 3)", for example.
or: "if(z) foo();" -> "(if z (foo))", ...


so, for example:
HLL -> (Parser) -> S-Exps
S-Exps -> (High-level transforms) -> S-Exps
S-Exps -> (C Emitter) -> C


(note, at this moment, I use XML and not S-Exps internally, but this is a
side issue...).


"high-level transforms" is basically a kind of recursive-step expression
rewriting process, which would mostly be responsible for rewriting trivial
expressions into more-trivial expressions, such as converting "(+ 1 2)" into
"3", eliminating non-useful expressions, ...

for C, not that much is really needed here, since C is itself smart enough
to manage most of this in most cases (except maybe with dynamically-typed
languages).

the C emitter would be, likely, mostly a process of walking the produced
syntax tree, and mostly "unwinding" this into C style syntax and semantics
(the exact details of this step likely varying most with the structure and
semantics of the input HLL).


a note for producing C:
don't expect to be (necessarily) able to produce the whole C output file
sequentially, instead, it is advised to allow spliting the generation
possibly into a number of disjoint pieces (say, individual functions), which
are sown together as the emitter process "unwinds".

useful to look into here may be to either have text-buffers, or maybe
"ropes".
if one already has support for cons cells, lists, and strings, than most of
this amounts to basic list operations, such as appending lists, ...

this is most useful if the input HLL differs notably from C in terms of its
overall structure (for example, if one were doing a LISP -> C compiler, for
example).


note that it may also be possible to partition up the emitter logic so that
it, itself, produces the text sequentially, but in practice this process is
more awkward to work with IME.


well, ok, all this probably sounds a bit more complicated, oh well...

maybe look into different options (tools probably are an option, I just
don't have a good suggestion here), and see which best fits with
requirements and goals...

0
Reply BGB 12/30/2009 7:28:04 PM

>Anyone have suggestions for a way to take a fairly simple "custom"
>high level language (syntax and semantics) to target C as the output?


I suggest the parser generator IDE "TextTransformer":

http://www.TextTransformer.com

By this program you can create a parser for your custom language as
well as the translator inside of a single visual user interface
(Windows). The Source to Source compilation can be done either with
TextTransformer itself as standalone application or by means of
C++-code, which can be generated from the TextTransformer project.


I am the author of TextTransformer and I just have made such a
converter from Delphi to C++

http://www.texttransformer.com/Delphi2Cpp_en.html

>The "tool" must be relatively easy to use assuming ...
>but not much in the way of code generation knowledge.

My method is to construct a tree in the form of the target language
just while the source code is parsed. This tree finally has to be
written top down with some pretty-printing manipulations at the end.
So the most important you have to know is how to combine tree nodes
and branches in the right order. The greatest difficulty is to manage
type information.

I'm still busy to improve Delphi2Cpp and you could profit from my
experiences if you like to use TextTransformer.

--
Detlef Meyer-Eltz

0
Reply Detlef 12/31/2009 2:49:28 PM

3 Replies
194 Views

(page loaded in 0.04 seconds)

5/21/2013 2:28:22 AM


Reply: