f



misc, refocus: new possible lang idea, BSC

this is a misc and hypothetical idea:

IMO all this is unlikely to move much beyond a personal effort.


well, I have been beating together a C compiler, and also ended up thinking 
some about my script lang, integration, ...

in this, I have come up with a new possible idea:
I don't fully implement C, per se, but technically a different language that 
just happens to be mostly backwards compatible and can link directly with C 
(and the existing C runtime, another goal).

so, the name BSC will mean BGB-Script/C, or a kind of hybrid (my current 
script lang is called BGB-Script, and sort of resembles C, but is nowhere 
near the same).


so, I will still rely primarily on the C99 spec, possibly ommiting or 
changing a few minor things that shouldn't really effect compatibility (ie: 
removing trigraphs and other similar features, generally assuming that 
source code is in either ASCII or UTF-8, ...).

I am considering adopting a few minor features from C++ and C++/CLI (no 
major plans as of yet).


the net result is likely to look like C+GC+dynamic types+prototype OO+...

unlike Objective-C, it will probably look less bizarre.
unlike C#, it will (probably) still run C code (and on the native HW).
....


so, what I may add:

support for garbage collected types/values. in many these will be handled by 
having a carrot in declarations, ie:

byte ^bp;    //tagged reference for a byte array
struct foo ^op;  //tagged reference for a struct

wheras:
byte *bp;    //pointer to a byte array
struct foo *op; //pointer to a struct

some types will not use a carrot, since they will be tagged-only:
elem    //generic type, much like void* for pointers
list    //list of cons cells
object  //(generic) prototypical object
....

unlike in my script lang, type names will be required (var is also renamed 
to elem, since that is what it is called in most of my existing C sources, 
or at least that use my VM...).


other rules:

it may not be possible to assign between tagged-references and pointers, 
since they are fundamentally different (in some cases though I will allow it 
via marshalling).

byte *bp;
elem n;

n=bp;
will be handled by marshalling the pointer into a pointer-object (this will 
allow, for example, storing pointers in a prototype object, ...).

bp=n;
may be possible, which would implicitly unmarshall the pointer.

the other direction will likely be viewed as invalid.


one may be able to declare variables differently:
byte* a, b;

will be the same as:
byte *a, *b;


possible, new keyword:

typedef struct {
int x, y;
}foo;

foo *bar=new foo;    //creates a new instance of foo on the C heap
foo ^baz=gcnew foo;  //creates foo on the GC heap

elem n;
n=baz;
will be acceptable, since full type info will be preserved.


objects:

I may add either a single or 2 object systems.
one will be based on prototype OO;
the other, if implemented, will be based on a class/instance system.

the former, as is in my current script lang, will have object structure be 
dynamically constructed, support delegation, ...


the latter will be a simplified class/instance system, likely only 
supporting single inheritence (otherwise using 'interfaces'). this system 
could potentially be allowed to have a dynamic component as well (implicit 
prototype methods and slots).

this would make a difference in the case of an unknown method or slot. 
absent prototypical features, this will be viewed as an error, but with such 
features, the slot will be ignored or created if not present.

such a class/instance system could be based on either prototype objects 
(thus being located on the GC heap only) or structs (allowing both, but 
complicating the implementation).


other things:

could add operator overloading, and a number of features from my script lang 
(probably with altered syntax to fit in more nicely with C).

some features, if added, are unlikely to be directly usable from normal C 
code (or stored in pointers). these would likely include lexical closures, 
which though similar, will be a rather different beast than function 
pointers.

int foo(int x, int y) { return(x*y); }

which could likely be written:
int foo(int x, int y) x*y;

as I am likely to re-introduce the implicit brace, return, and tail-expr 
semantics.


int (*fp)(int, int);
elem n;

fp=foo;    //creates a function pointer
n=foo;     //creates a first-class function


so, in BSC, both would be the same (both being directly callable), but in 
true C, the former is a function pointer, and the latter would have to be 
called through a wrapper.



0
cr88192
3/25/2007 3:47:58 AM
comp.lang.misc 1780 articles. 1 followers. Post Follow

3 Replies
542 Views

Similar Articles

[PageSpeed] 2

"cr88192" <cr88192@NOSPAM.hotmail.com> wrote in message 
news:67835$4605f0de$ca83a8d6$28447@saipan.com...
> this is a misc and hypothetical idea:
>
> IMO all this is unlikely to move much beyond a personal effort.
>
>
> well, I have been beating together a C compiler, and also ended up 
> thinking some about my script lang, integration, ...
>
> in this, I have come up with a new possible idea:
> I don't fully implement C, per se, but technically a different language 
> that just happens to be mostly backwards compatible and can link directly 
> with C (and the existing C runtime, another goal).
>

I have made the idea much less ambitious, and less like a "true" hybrid, and 
more like dumping some of my script-lang's functionality on top of C (ok, 
maybe more of a true hybrid on this stance, oh well...).

in particular, the whole idea of the carrot syntax has been dropped. it 
would add implementation complexity for no real gain.

so, instead, I will just have some new primitive types (elem, list, object, 
....), and some automatic marshalling/gc/... features.


void foo()
{
    //BS1_BEGIN();    //acts as if present
    elem n, m;    //BS1_ROOT(n); BS1_ROOT(m);
    object o;     //BS1_ROOT(o);
    list l;       //BS1_ROOT(l);
    int i;

    n=i;    //same as: SET(n, FIXNUM(i));
    i=n;    //i=TOINT(n);
    m=n+i;  //SET(m, BS1_ADD(n, FIXNUM(i)));

    o=object { x=3; y=4; };
//      SET(o, BS1_SObj_New());
//      BS1_SObj_BindSlot(o, SYM("x"), FIXNUM(3));
//      BS1_SObj_BindSlot(o, SYM("y"), FIXNUM(4));

    l=list {1, 2, 3};
    //SET(l, LIST3(FIXNUM(1), FIXNUM(2), FIXNUM(3)));

    ...

    //BS1_END();
}


some other functionality, if added, will be similar (there will be little 
effort to hide what is going on, ie, that there are 2 fundamentally 
different systems at play).


potential costs:
dynamic type handling is likely to be less optimized, and so they will 
likely be rightfully slower than static types, or the script lang (for 
example, I am unlikely to infer dynamic types in this case, ...).


I am unlikely to add any more "advanced" features, such as lexical scoping, 
closures, continuations, tail-call optimization, ... or at least, absent 
strong usability/interface restrictions.

for example, a problem is this:
C code tends to call things directly, and C interface is the primary goal;
many of these features, as implemented in the script lang's JIT compiler, 
necissarily made a few provisions (particularly, having a VM context 
available, and reliance on trampolines). since I have neither a VM context 
available, nor would trampolines really be workable in this case, these 
features are likely unreasonable.


and so, this effort will likely yeild neither a usable successor to plain C 
nor to my script lang, but may have strong FFI related uses (and may help 
provide some ideas for a future direction).


the main gain will be that it will be compiled at runtime, and slightly less 
verbose than the heavily macroed VM sources (of course, to still allow the 
macroed forms, for this version I may have to fudge the macros or something, 
since from the compilers' perspective elem is no longer simply a bit-packed 
integer...).

or such...



0
cr88192
3/25/2007 11:55:31 PM
"cr88192" <cr88192@NOSPAM.hotmail.com> writes:

> this is a misc and hypothetical idea:
>
> so, what I may add:
>
> support for garbage collected types/values. in many these will be handled by 
> having a carrot in declarations, ie:
>
> byte ^bp;    //tagged reference for a byte array
> struct foo ^op;  //tagged reference for a struct
>
> wheras:
> byte *bp;    //pointer to a byte array
> struct foo *op; //pointer to a struct

If you allow untagged values pointing to tagged (GC'ed) values, you
will have a problem identifying your root set -- basically, you need
to travers also untagged structures to find pointer to tagged values.

So I suggest either to make a rule that says that you can't have
untagged values point to tagged values (and disallow casts of tagged
values to untagged values) or you drop the idea of partial GC and GC
all heap-allocated values, possibly using a Bohm-style conservative
collector.

For inspiration, you could look at Cyclone
(http://en.wikipedia.org/wiki/Cyclone_programming_language), which is
a C-like language with GC and other interesting features.

	Torben
0
torbenm
3/26/2007 8:07:34 AM
"Torben "�gidius" Mogensen" <torbenm@app-6.diku.dk> wrote in message 
news:7z1wjcpgvt.fsf@app-6.diku.dk...
> "cr88192" <cr88192@NOSPAM.hotmail.com> writes:
>
>> this is a misc and hypothetical idea:
>>
>> so, what I may add:
>>
>> support for garbage collected types/values. in many these will be handled 
>> by
>> having a carrot in declarations, ie:
>>
>> byte ^bp;    //tagged reference for a byte array
>> struct foo ^op;  //tagged reference for a struct
>>
>> wheras:
>> byte *bp;    //pointer to a byte array
>> struct foo *op; //pointer to a struct
>
> If you allow untagged values pointing to tagged (GC'ed) values, you
> will have a problem identifying your root set -- basically, you need
> to travers also untagged structures to find pointer to tagged values.
>
> So I suggest either to make a rule that says that you can't have
> untagged values point to tagged values (and disallow casts of tagged
> values to untagged values) or you drop the idea of partial GC and GC
> all heap-allocated values, possibly using a Bohm-style conservative
> collector.
>

actually, as I was considering it, one 'can' violate the rules (just as they 
can do terrible things in C with pointers), however, this will be viewed as 
broken style.


I have had more than my fair share of experience with conservative 
collection. that is why I have gone back to precise collection, namely that 
the performance overhead of scanning the whole damn stack/data/bss sections, 
and all these pointer-sparse buffers, ... for no reason, and being unable to 
employ more powerful teqniques (such as reference counting and 'checking'), 
....

just can't take this, especially for bulky real-time software.
as such, I leave conservative collection for C-only uses (the main point 
being that I can relax and let things fail to get freed if inconvinient).

for my VM and similar tasks, I have returned to precise GC.
as noted, by default precise GC and ref-counting leaves some annoyances to 
deal with (and at the same time one is left finding places to skip 
ref-counting in order to try to improve performance, or just avoid 
inconvinience).


also, this language is likely going to have mostly direct/raw access to the 
C runtime, and this means malloc/free...


> For inspiration, you could look at Cyclone
> (http://en.wikipedia.org/wiki/Cyclone_programming_language), which is
> a C-like language with GC and other interesting features.
>

ok.

somehow I suspect that most existing source would require modification to 
work with this language (this is something I am trying to minimize/avoid if 
possible).


as noted in a follow-up the idea has been modified to be less drastic. in 
particular, the GC'ed types have been made generally much more opaque, and I 
may make these more like extensions that have to be enabled somehow, or not.


or something...


as for the actual compiler:
I am starting to look some into possibly starting on the main compile 
process...

for right now, I will see what I can reuse from the script compiler (some, 
at least hopefully).

some thoughts are going into exactly which aspects of processing will go 
into the parser, the reducer, and the compiler.

for example, right now in the parser variable definitions are reworked such 
that each variable is fully self-contained wrt type, but at present the plan 
is that expression types will be determined in the reducer.

presumably as of yet, reduction can't be done in the parser, or at least 
absent building a compilation context at parse time.

right now the plan is that I will still leave compilation and reduction as a 
seperate pass. note: 'reduction' here is basically where I step along the 
parse tree during compilation, and use a process that resolves types and 
attempts to use pattern matching to simplify/eliminate expressions (the 
results of this process are then compiled into bytecode).


dunno as of yet...


> Torben


0
cr88192
3/26/2007 9:13:53 AM
Reply: