Generic Assembler?

  • Follow


I have started work (i.e. research and structure-creating algorithms)
on a compiler that is intended to work solely as a scripted
preprocessor--preprocessing the source code in levels based on
symbological and structural definitions, then translating that into a
lower level format via scripting. This process would be repeated for
varying layers of languages, with the output of each being fed into
the next layer.

C ruleset -> Generic Assembler Language -> Platform Assembler -> Bytes
-> Object File

For instance would be the various rulesets that could be applied. (The
middle three would probably appear as a single file though, with the
last simply being a library of functions that abstracted an object
file.)
Below is an example of what such a rule file might look like.

iso_c.rules
-----------
|rules uses="assembler.rules" >   |> Tell it we are defining new rules
which produce code that needs to be parsed by the assembler.rules
ruleset <|

   |comment single >//<
   |comment multiple >/*|.*<*/<

   |structure if >
      if|space*<(|code if_condition<)|space*<{|code if_code<}|space*<
      |optional >
         |optional multiple >   |> can be any number of these <|
            |structure else_if >
               else|space+<if|space*<(|code
else_if_condition<)|space*<{|code else_if_code<}|space*<
            <
         <
         |structure else >
            else|space*<{|code else_code<}|space*<
         <
      <
   >>
      |writeln "cmp " . if_condition . ", 0"<

      |exists else_if >   |- if () {} else if () {}
         |writeln "je ELSE_IF_" . |id else_if[0]<<
         |writeln if_code<

         |for L = 0 # L < else_if.length # L++ >
            |writeln "ELSE_IF_" . |id else_if[L]< . ":"<   |> Label <|
            |writeln "cmp " . else_if_condition[L] . ", 0"

            |exists else_if[L + 1] >   |> Check if there is another
else if <|
               |writeln "je ELSE_IF_" . |id else_if[L + 1]<<
               |writeln else_if_code[L]<
               |writeln "jmp STRUCT_END_" . |id if<<

            >>   |> No more else ifs <|
               |exists else >   |> Check if there is an else <|
                  |writeln "je ELSE_" . |id else<<
                  |writeln else_if_code[L]<
                  |writeln "jmp STRUCT_END_" . |id if<<

                  |writeln "ELSE_" . |id else< . ":"<   |> Label <|
                  |writeln else_code<

               >>   |> No else <|
                  |writeln "je STRUCT_END_" . |id if<<
                  |writeln else_if_code[L]<
               <
            <
         <

      >>   |> No else if statements <|
         |exists else >   |- if () {} else {}
            |writeln "je ELSE_" . |id else<<
            |writeln if_code<
            |writeln "jmp STRUCT_END_" . |id if<<

            |writeln "ELSE_" . |id else< . ":"<   |> Label <|
            |writeln else_code<

         >>   |- if () {}
            |writeln "je STRUCT_END_" . |id if<<
         <
      <

      |writeln "STRUCT_END_" . |id if< . ":"<   |> Label for end <|
   <   |> End of if structure <|

<  |> Finished defining rules <|
-----------

In the above, we define a regular expression for an if/else if/else
structure, and then a body of code which will be called for each
instance of that structure with various variables already initialised
by the parser that we can access to be able to create a proper
assembler representation.

Anyways, my hope is to have the assembler.rules file be a processor
specific file that will accept processor specific assembler _and_ a
generic assembler language which would be required of any
assembler.rules file (and which files like iso_c.rules would be
required to use exclusively for portability.) But I only know x86
family assembler and wanted to inquire if there were any particular
instructions that seem fairly universal and/or can vey easily be
spoofed in two instructions on any non-supporting chip?

Certainly I can just take C and all of the various binary operations it
has plus "return", but I would like to get as many extra ones in there
if they seem to be fairly universal (so it would be worthwhile to code
in the generic assembler things other than language definitions.) Does
anyone know of any instructions like this?

As to the language itself, again I only know x86, so am a bit worried
that there are instruction sets which are entirely off-the-wall from
what I know (like C compared to Lisp.) If there is something like this,
could you please point me to some reference material on it, so I can
study it?

Also kindly note that I have never created a compiler and am only on
about page 14 of the Dragon Book, so...be gentle.

Thank you,
Chris Williams
0
Reply thesagerat (66) 2/28/2005 5:50:51 AM

Chris Williams wrote:
> I have started work (i.e. research and structure-creating algorithms)
> on a compiler that is intended to work solely as a scripted
> preprocessor--preprocessing the source code in levels based on
> symbological and structural definitions, then translating that into a
> lower level format via scripting. This process would be repeated for
> varying layers of languages, with the output of each being fed into
> the next layer.
>
> C ruleset -> Generic Assembler Language -> Platform Assembler -> Bytes
> -> Object File
>
[...]
>
> Also kindly note that I have never created a compiler and am only on
> about page 14 of the Dragon Book, so...be gentle.
>
> Thank you,
> Chris Williams

Check out the Architecture-Neutral Distribution Format (ANDF)
project's web page.  ANDF was a serious and well funded attempt from
about 1990 that sought to create a generic object file format that
would be portable onto many CPU/OS architectures while preserving
optimization.  Commercially, it did not succeed.
 You'll want to learn its lessons.

  http://www.info.uni-karlsruhe.de/~andf/

BTW, GCC's RTL is somewhat similar to a generic assembly language.
It's an Intermediate Representation (IR) language derived from the
compiler's Abstract Syntax Trees that is further optimized before
being translated into the target assembly.  The design of RTL and the
process used by GCC to do this translation may be of interest to you.

  http://www.cse.ohio-state.edu/cgi-bin/info/info/gcc,RTL
  http://www.cse.ohio-state.edu/cgi-bin/info/info/gcc
  http://www.psc.edu/general/software/packages/gcc/manual/gcc_toc.html

LCC is another easily retargetable C compiler which also generates
assembly portably.  It's smaller and more comprehensible than GCC, so
it may be more accessible to you:

  http://www.cs.princeton.edu/software/lcc/

Here are two interesting threads from 2004 and 1997 on generic assemblers:

  http://www.codecomments.com/archive285-2004-8-234437.html
  http://compilers.iecc.com/comparch/article/97-05-156

Here's a generic assembler project on Sourceforge:

  http://sourceforge.net/projects/sgasm/

    Randy
--
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
[ANDF is the most recent in a long line of projects generally known as
UNCOLs, attempts to make a universal intermediate language.  They all
failed.  If you constrain your source and target languages enough you
can get them to work, but the more general you try to get, the faster
you run into heat death. -John]

0
Reply Randy 3/1/2005 12:51:27 AM


Chris Williams wrote:

> I have started work (i.e. research and structure-creating algorithms)
> on a compiler that is intended to work solely as a scripted
> preprocessor--preprocessing the source code in levels based on
> symbological and structural definitions, then translating that into a
> lower level format via scripting. This process would be repeated for
> varying layers of languages, with the output of each being fed into
> the next layer.

This reminds me of Mortran, a language designed to be an improved
Fortran, with a processor to convert to ANSI standard Fortran.

The processor was written as a set of self-modifying macros, and a
macro processor to run them.

> C ruleset -> Generic Assembler Language -> Platform Assembler -> Bytes
> -> Object File

> For instance would be the various rulesets that could be applied. (The
> middle three would probably appear as a single file though, with the
> last simply being a library of functions that abstracted an object
> file.)

There are some interesting differences between assemblers separate
from the actual opcodes being used.  Some consider a label if it
starts in column one, otherwise no label.  Others use a colon to
indicate a label, or sometimes two.  Some require a special character
to indicate comments on the same line as code, others do it based on
fields.  (label, opcode, operands, comments).  Some consider the
destination operand on the left, others on the right.

I have seen various assemblers (Z80, 6502, 6809) written as macros for
the OS/360 assembler, which means they follow OS/360 assembler rules
even though they use the appropriate opcodes.  Then again, the gnu
assembler often uses different opcodes and syntax from that normally
used by the appropriate machine.

It should not be too hard to build a table driven assembler that would
allow for different combinations of input syntax.

-- glen
0
Reply glen 3/1/2005 8:50:41 PM

Chris Williams wrote:
> I have started work (i.e. research and structure-creating algorithms)
> on a compiler that is intended to work solely as a scripted
> preprocessor--preprocessing the source code in levels based on
> symbological and structural definitions, then translating that into a
> lower level format via scripting. This process would be repeated for
> varying layers of languages, with the output of each being fed into
> the next layer.

Instead of going via assembler, I'd suggest you take a look at the work
of Dr. Michael Franz. His homepage is here:
http://www.ics.uci.edu/~franz/

In particular, his work on Semantic Dictionary Encoding (see his PhD
thesis at
http://www.ics.uci.edu/~franz/Site/publications.html)

It was originally put in use in Mac Oberon as a unified intermediate
format suitable to generate the final code for PowerPC or M68k Mac's on
the fly. It should be ideally suited for your approach as it maintains
much of the higher level semantic structure of a program while
transforming it into a form that is well suited for fast on the fly
code generation / JIT'ing.

Vidar
0
Reply Vidar 3/4/2005 7:25:48 PM

3 Replies
58 Views

(page loaded in 0.063 seconds)

3/17/2013 8:52:25 PM


Reply: