f



New experimental back end for C/C++ compiler targeting CLI

David, creator of OrangeC and C++ compiler, are working on a
implementation of a new back-end for their compiler, to support code
generation for CLR... at now, the compiler can build some complex
programs in C to MSIL, like bzip2.
Btw, at now, is just a experimental project, it's not build every C
program...

https://github.com/LADSoft/OrangeC
https://github.com/LADSoft/Simple-MSIL-Compiler/

It may be that somebody else is interested in the project :)

0
Alexandre
8/25/2016 1:40:27 PM
comp.compilers 3310 articles. 0 followers. Post Follow

10 Replies
286 Views

Similar Articles

[PageSpeed] 27

On Thursday, August 25, 2016 at 10:17:46 AM UTC-5, Alexandre wrote:
> Btw, at now, is just a experimental project, it's not build every C
> program...

I've been looking seriously, as of late, at doing {machine,source} to source
synthesis -- even "normalization" (source to source synthesis from/to the same
language); the key trick being to combine the so-called abstract syntax tree
and control flow graphs into one by making the abstract syntax tree, itself,
the control flow graph ... i.e. by using an infinitary abstract syntax. That
means that a loop
   while (E) S
gets represented the very same way as the branch
   if (E) { S while (E) S }
in the graph.

This goes perfectly with continuation semantics since it requires a
place-holder Q; so that a statement {if (E) S} is "completed" with Q into
   {if (E) S}[Q] = E? S[Q]: Q
When applied to the loop, the result is a conditional expression with an
infinite abstract syntax
   E? S[E? S[E? S[...]: Q]: Q]: Q.
The abstract syntax "tree", itself, is compactly represented in its "maximally
rolled up" form as a cyclic graph; and that graph is none other than the
control flow graph. So the AST and CFG become one and the same.

Data flow analysis is done by the framework underlying the "magic algebra" I
described here:
https://www.scribd.com/document/235320018/Magic-Algebra-The-Algebraic-Approac
h-to-Control-Flow-Analysis

The REAL trick though -- especially with source-to-source transformation is
converting the white space! About 75-80% of the "indent" utility revolves
around that very issue; and when I do conversions by hand a similar percentage
of effort is involved in dealing with this issue -- particularly in converting
or rewriting the language IN the comments.

A really good source-to-source compiler should also be making proper
adjustments to the comments. That may even entail a degree of natural language
processing! Except for the "may" part.
0
rockbrentwood
8/26/2016 7:54:34 PM
rockbrentwood@gmail.com schrieb:

> A really good source-to-source compiler should also be making proper
> adjustments to the comments. That may even entail a degree of natural language
> processing! Except for the "may" part.

Such problems occur only on source-to-source translation, where the
output should be human readable. A compilation into instructions of a
physical or virtual machine does not necessarily include comments, and
decompilation of executable code doesn't have to deal with comments at all.

Programming language translations are possible only for sufficiently
similar languages. If you look inside the "original VB" to MSIL
translation (not VB.NET), you'll find horrible constructs for the
implementation of the Basic ON ERROR... statements, even if both
languages are imperative.

In a source-to-source translation I'd not try to process comments,
except for structural parts like XML or doxygen tags.

In my C to Pascal converter I encountered many problems, not related to
control flow. I found most challenging the decisions, which parts of the
input should be "copied" unchanged, and which parts have to be
interpreted, i.e. must be broken down into more basic constructs. Even
if both language have equivalent control structures, like switch
statements, these may be semantically so different, that a direct
mapping is not always possible. The same for preprocessor statements -
can/should they be translated into equivalent high-level contstructs of
the target language, or do they have to be expanded before a translation?

DoDi

0
Hans
8/29/2016 8:28:10 AM
On 2016-08-26 3:54 PM, rockbrentwood@gmail.com wrote:
> On Thursday, August 25, 2016 at 10:17:46 AM UTC-5, Alexandre wrote:
>> Btw, at now, is just a experimental project, it's not build every
>> C program...

> A really good source-to-source compiler should also be making proper
> adjustments to the comments. That may even entail a degree of natural
> language processing! Except for the "may" part.

I have done a few source to source asm compilers that include retaining
the original comments and sources.

What I did was rewrote the asm back-end to emit simple C statements and
then compiling these to the new target processor. For real life
applications it is an 85% solution. It gets the machine code correct for
the target processor but there are I/O code that is so architectural
specific that it needs to be hand translated.

The interesting part is the code transformation part works really well.
To the extent that generated code is often smaller because most C
compilers do very well with small statements and using C as an
intermediate form isolates most of the processor specific processor issues.

w..
0
Walter
8/29/2016 5:15:49 PM
On 29/08/2016 09:28, Hans-Peter Diettrich wrote:
> rockbrentwood@gmail.com schrieb:
>
>> A really good source-to-source compiler should also be making proper
>> adjustments to the comments. That may even entail a degree of natural
>> language
>> processing! Except for the "may" part.
>
> Such problems occur only on source-to-source translation, where the
> output should be human readable. A compilation into instructions of a
> physical or virtual machine does not necessarily include comments, and
> decompilation of executable code doesn't have to deal with comments at all.

I've played with source-to-source translators, and never managed to deal
with comments properly and in the end gave up.

Take, for example, a C-like language with //-comments to end of line.
Such a language is free-format so it could be written as one token per
line, with a comment on each line plus comments on their own lines.

Now you can have a series of tokens and comments. There may be several
comments between two tokens. The comment may refer to what's gone
before, or to what follows. And while it may be adjacent to a particular
token, it it may be talking about a block of code consisting of hundreds
of tokens.

Even if translating to normalised code in the same language, it can be
challenging knowing where the comments are going to go.

And if to a different language where one set of tokens in the source is
represented by different set of tokens in the output, which can be
smaller or bigger, how to decide where each comment is to go?

> The same for preprocessor statements -
> can/should they be translated into equivalent high-level contstructs of
> the target language, or do they have to be expanded before a translation?

The preprocessor was one reason I never got round to a source-to-source
translator with C code as input.

I intended this to be a *one-time* translation from C, to my own syntax;
once done, I could discard the C.

However, what to do about conditional code in the source? If it selects
between code-sequence A and B according to some condition, I can only
translate A or B, not both. The condition might be some external macro
(from a third party header that in future could be different, or it
could be a compile-time option. And in general there will be nested
conditional code).

With a normal compiler, it doesn't matter: each time it runs, it will
generate either A or B as required. With a one-time translator, I have
to choose one! (Or map it to conditional code in my language. But I
wanted to get away from C and its preprocessor.)

--
Bartc
0
BartC
8/29/2016 5:52:07 PM
On 2016-08-29 1:52 PM, BartC wrote:
> I've played with source-to-source translators, and never managed to
> deal with comments properly and in the end gave up. ...


How we handled the comments on the asm to asm translation was put the
whole sourceline into in our case a C statement // comment with enough
parsing information to tie it as well to the generated code. A filter
(In our case a compiler pragma) reprocessed the listing line. It could
have been done with a small post processing program quite easily.

It proved quite workable.

w..

0
Walter
8/30/2016 1:21:46 AM
BartC schrieb:
> On 29/08/2016 09:28, Hans-Peter Diettrich wrote:

> Take, for example, a C-like language with //-comments to end of line.
> Such a language is free-format so it could be written as one token per
> line, with a comment on each line plus comments on their own lines.

In the worst case the auther implemented a two-column layout, with a
code block in the left and a comment block in the right column. Who'll know?


> Even if translating to normalised code in the same language, it can be
> challenging knowing where the comments are going to go.

Even comment-based document generators often have problems with the
unintended use of control tags by the coders. Even worse without such
hints :-(

DoDi

0
Hans
8/30/2016 11:13:42 AM
On 2016-08-29 9:21 PM, Walter Banks wrote:
> On 2016-08-29 1:52 PM, BartC wrote:
>> I've played with source-to-source translators, and never managed
>> to deal with comments properly and in the end gave up. ...
>
> How we handled the comments on the asm to asm translation was put
> the whole sourceline into in our case a C statement // comment with
> enough parsing information to tie it as well to the generated code. A
> filter (In our case a compiler pragma) reprocessed the listing line.
> It could have been done with a small post processing program quite
> easily.
>
> It proved quite workable.

Something else to think about for comments. The source level debugging
format we often use ties generated code to both source files and
listing files. The system I described above uses this information. We
have often used this information on many different debug applications
after post processing.

In the asm to asm translations we have done it is quite strange to see a
listing file for an target processor being stepped with a source file of
the original processor.

I think then same thing could be done with elf dwarf extensions as well.

w..
0
Walter
8/30/2016 5:44:48 PM
On 30/08/2016 02:21, Walter Banks wrote:
> On 2016-08-29 1:52 PM, BartC wrote:
>> I've played with source-to-source translators, and never managed to
>> deal with comments properly and in the end gave up. ...
>
>
> How we handled the comments on the asm to asm translation was put the
> whole sourceline into in our case a C statement // comment with enough
> parsing information to tie it as well to the generated code. A filter
> (In our case a compiler pragma) reprocessed the listing line. It could
> have been done with a small post processing program quite easily.

I'm mainly familiar with ASM syntax that is strictly line-oriented and
with a flat code structure.

Then comments can either be tied to a particular instruction, or are on
their own line and might be assumed to refer to the next block of
instructions. (But then, they could be continuing a long comment too.)

But I'm not sure what you mean by an ASM to ASM translator; from your
other remarks it sounds like this is an ASM to C translator, where the C
is then compiled to another ASM (presumably with the original comments?).

It still doesn't sound that straightforward when N commented lines in
the source end up as M commented lines in output. And surely sometimes
the comments for the source ASM are meaningless on the target?

--
Bartc
0
BartC
8/30/2016 9:48:49 PM
On 2016-08-30 5:48 PM, BartC wrote:
> I'm mainly familiar with ASM syntax that is strictly line-oriented
> and with a flat code structure.
>
> Then comments can either be tied to a particular instruction, or are
> on their own line and might be assumed to refer to the next block of
> instructions. (But then, they could be continuing a long comment
> too.)
>
> But I'm not sure what you mean by an ASM to ASM translator; from
> your other remarks it sounds like this is an ASM to C translator,
> where the C is then compiled to another ASM (presumably with the
> original comments?).
>
> It still doesn't sound that straightforward when N commented lines
> in the source end up as M commented lines in output. And surely
> sometimes the comments for the source ASM are meaningless on the
> target?

You are mostly correct. I missed making it clear that it is a subset of
the problem here. The asm to asm translator goes through a C
intermediary that is C transparent in that the intermediate C doesn't
show in the generated listing file. What isn't transparent is some of
the comments are on asm lines and others are comment blocks. It is
important that these show up with the correct context even though there
may be code motion in the intermediate translation.

This is a subset to the problem of language to language translation
unless I have missed something in this thread.

w..
0
Walter
8/31/2016 5:24:07 PM
On 30/08/2016 22:48, BartC wrote:
> Then comments can either be tied to a particular instruction, or are on
> their own line and might be assumed to refer to the next block of
> instructions. (But then, they could be continuing a long comment too.)

In an early version of the FermaT Program Transformation System
(http://www.gkc.org.uk/fermat.html) we decided that the "obvious"
way to handle comments was to have an optional "comment" field in every
node of the parse tree. A comment could be attached to any statement,
expression, condition etc. and would stay attached as the code
was transformed.

The parsers used heuristics to attach a parsed comment to the nearest
likely statement: so it quickly became clear that there would be very
little need for comments attached to expressions, conditions etc.

In fact, attaching the comments to statements raised a number
of problems: every transformation had to decide what to do
with any comments attached to the code it operated on.
For example: if the statement is deleted, do we delete
the comment or leave it behind?
If the latter, which statement does it now become attached to?
If a statement is duplicated, do both copies get the comment
or just the original? Some transformations work by duplicating
code and then eliminating some or all of the copies
(for example, "Expand and Separate"). How should these deal
with comments? And so on...

After some discussion it was decided to implement comments
in the form of a "comment statement": a statement which can
appear anywhere that a normal statement appears but which
semantically has no effect (it acts the same as a SKIP statement).
This required much less work for the transformations to handle
comments correctly: most transformations just treat a comment
the same as any other statement. Some check for comments
and move them to the appropriate place, using the fact
that a comment statement can be moved anywhere
and that a comment refers to the code which follows it.

Our assembler parser (used in the commercial assembler to C and COBOL
migration system developed by Software Migrations Ltd) recognises
three classes of comments:

(1) The "Narrative Block Comment": this is the first substantial
block of comments found at the start of the module. These comments
are taken out and added to the final C or COBOL code as a block
comment at the top of the migrated module.

(2) Comment lines: a line which contains only a comment.
Typically, these are more important than comments
attached to an instruction. A sequence of comments followed
by a label will be moved to a point just after the label:
this ensures that these comments stay with the code that they
refer to, since they usually refer to the code after the label.

(3) Comments on the same line as an instruction.
Typically, these refer to details of the implementation
and are not usually relevant to the migrated code.
Some customers choose to have these comments removed
in the final migrated code. If they are required,
the comment statement will be placed just before
the code generated by the instruction.
The transformation process frequently merges simple code
into more complex statements: in this case, there will
be several comments referring to the same statement.

For more on the program transformation system see:
"Pigs from Sausages? Reengineering from Assembler to C
via FermaT Transformations" Martin Ward
Science of Computer Programming, Special Issue on Program
Transformation, Vol 52/1-3, pp 213-255, 2004.
doi:dx.doi.org/10.1016/j.scico.2004.03.007

http://www.gkc.org.uk/martin/papers/migration-t.pdf

A more recent paper on migrating assembler is
"Assembler Restructuring in FermaT" Martin Ward
13th IEEE International Working Conference on Source Code Analysis and
Manipulation, 22nd b 23rd September 2013, Eindhoven, The Netherlands.

http://www.gkc.org.uk/martin/papers/assembler-restructuring-t.pdf

--
             Martin

Dr Martin Ward STRL Principal Lecturer & Reader in Software Engineering
martin@gkc.org.uk  http://www.cse.dmu.ac.uk/~mward/  Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors:  http://www.gkc.org.uk  and  http://www.gkc.org.uk/gkc
0
Martin
9/6/2016 5:09:12 PM
Reply: