f



Alternative C compilers on x86_64 Linux?

Hi.

Can I get recommendations for other (free) C compilers besides GCC and CLANG?
I've been using the revived PCC for gawk development since it's faster
than GCC, but recently it's developed a bug where it won't compile the
current (valid) code.

LCC seems to be 32 bit only and requires very manual configuration.

TinyCC is blindingly fast, and can compile gawk, but is broken in that
it won't diagnose duplicate case statements inside switch. The developers
don't consider this a problem. So I refuse to use it.

In short, I'm looking for a faster compiler that actually works.

Thanks,

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
0
arnold
9/2/2016 3:01:14 AM
comp.compilers 3310 articles. 0 followers. Post Follow

42 Replies
138 Views

Similar Articles

[PageSpeed] 50

Le 02/09/2016 ` 05:01, Aharon Robbins a icrit :
 > Hi.
 >
 > Can I get recommendations for other (free) C compilers besides GCC
and CLANG?
 > I've been using the revived PCC for gawk development since it's faster
 > than GCC, but recently it's developed a bug where it won't compile the
 > current (valid) code.
 >
 > LCC seems to be 32 bit only and requires very manual configuration.
 >
 > TinyCC is blindingly fast, and can compile gawk, but is broken in that
 > it won't diagnose duplicate case statements inside switch. The developers
 > don't consider this a problem. So I refuse to use it.
 >
 > In short, I'm looking for a faster compiler that actually works.
 >
 > Thanks,
 >
 > Arnold

Hi Arnold

lcc-win, a 64 bit version of lcc, runs under linux but I have never
tried to make it a real compiler system since there are no customers.

Nobody buys compilers in linux, so, either you use gcc or clang, or you
just write your own. As you write above you want it "free". That is why
I have never developed under linux, since I do not like to work for
free.  Why should I?

I understand the developers of TinyCC.

Why should they care? If you write duplicate switch cases its not their
fault, its yours.

It would take me several weeks of work to get lcc-win running.

It is very fast. It produces much better code than the original lcc, and
generates 64 bit code. That time would be just to get it running using
gas (the gnu assembler) and the gnu linker.

Other thing (also a derivative of lcc) that I have developed under linux
is a JIT. It will compile a source file into a buffer, and after
compiling it will jump into the generated code and execute it in the fly.

I developed that because I had a customer that wanted a linux version.

Why there are no other compilers besides gnu/clang?

Because, as you said, everybody assumes that no compensation will be
given to the guy/team that develops them.

Then linux is stuck with a single compiler that is very slow.

The intel compiler, the best compiler for the x86 environment runs under
linux. But that costs more than 700 dollars if memory serves.

You get what you pay for.

0
jacobnavia
9/4/2016 10:20:04 PM
On 01/09/2016 23:01, Aharon Robbins wrote:
> Hi.
>
> Can I get recommendations for other (free) C compilers besides GCC and CLANG?

Intel and Oracle both have free (but not open-source) compilers.

N.

0
Nemo
9/4/2016 10:32:14 PM
On 2016-09-02, Aharon Robbins <arnold@skeeve.com> wrote:
> Can I get recommendations for other (free) C compilers besides GCC and CLANG?
> I've been using the revived PCC for gawk development since it's faster
> than GCC, but recently it's developed a bug where it won't compile the
> current (valid) code.

Hi Aharon,

One idea is to treat C++ as a C dialect; then the GCC and Clang
C++ compilers are an option.

You can't use C features that aren't in C++ and vice versa.
Except those that you can hide behind a macro that is conditionally
defined for C and C++. I do this with casts, which is useful:

#ifdef __cplusplus
#define strip_qual(TYPE, EXPR) (const_cast<TYPE>(EXPR))
#define convert(TYPE, EXPR) (static_cast<TYPE>(EXPR))
#define coerce(TYPE, EXPR) (reinterpret_cast<TYPE>(EXPR))
#else
#define strip_qual(TYPE, EXPR) ((TYPE) (EXPR))
#define convert(TYPE, EXPR) ((TYPE) (EXPR))
#define coerce(TYPE, EXPR) ((TYPE) (EXPR))
#endif

The code compiles as C, but if treated as C++, it benefits
from these more constrained, safer casts.
0
Kaz
9/4/2016 11:48:55 PM
On 02/09/2016 04:01, Aharon Robbins wrote:

> TinyCC is blindingly fast, and can compile gawk, but is broken in that
> it won't diagnose duplicate case statements inside switch. The developers
> don't consider this a problem. So I refuse to use it.
>
> In short, I'm looking for a faster compiler that actually works.

I'm quite impressed with Tiny CC. I'd considered it a toy compiler that
could only compile a small subset of the language until I actually tried it.

And yes, on my Win64 version, it does seem to ignore duplicate 'case'
labels.

But TCC does have problem with compiling switch statements; although the
code it generates isn't that great anyway, that for switches is slower.

I suspect it just compiles switch as an if-else chain. Which might
explain why duplicate case labels are ignored if that conversion is done
at at early stage.

(Because a duplicate condition in an if-else chain is fine. For a
jump-table of course, a duplicate label is ambiguous so can't be allowed.)

--
Bartc
0
BartC
9/5/2016 12:19:12 PM
* Aharon Robbins:

> Can I get recommendations for other (free) C compilers besides GCC and
> CLANG?

Earliers versions of the golang toolchain contained a C compiler
(derived from the Plan 9 toolchain) which was ported to various
platforms supported by Go at the time.

0
Florian
9/5/2016 6:06:34 PM
On Sunday, September 4, 2016 at 12:02:33 PM UTC-7, Aharon Robbins wrote:
> Can I get recommendations for other (free) C compilers besides GCC and CLANG?
> I've been using the revived PCC for gawk development since it's faster
> than GCC, but recently it's developed a bug where it won't compile the
> current (valid) code.
>
> LCC seems to be 32 bit only and requires very manual configuration.
>
> TinyCC is blindingly fast, and can compile gawk, but is broken in that
> it won't diagnose duplicate case statements inside switch. The developers
> don't consider this a problem. So I refuse to use it.
>
> In short, I'm looking for a faster compiler that actually works.

What is preventing you from doing most of development using, say, that
same TinyCC and then, when you think you're done with the round of
changes (bugfixes, improvements, new features) recompiling the coded
with gcc or clang and rerunning the tests to make sure there are no
issues (warnings, real bugs) missed because of the limitations of the
fast compiler? Given the nature of C, I think, it's always beneficial
to try your code with different compilers (and on different platforms)
to uncover portability and other problems.

I'm not recommending my Smaller C compiler because it's more limited
than TinyCC, although it does check for duplicate cases, but there are
a few others you might want to try out (all Windows only, AFAIK):

Pelles C
Digital Mars
Visual C++ Express

Alex
0
alexfrunews
9/6/2016 12:22:58 AM
On Sunday, September 4, 2016 at 3:56:56 PM UTC-7, jacobnavia wrote:
> Why should they care? If you write duplicate switch cases its not their
> fault, its yours.

It is a constraint violation, and a trivial one, that the compiler
must identify and report instead of silently producing code that is
broken or appears to work by chance.

Alex
0
alexfrunews
9/6/2016 12:26:20 AM
On 2016-09-05, BartC <bc@freeuk.com> wrote:
> On 02/09/2016 04:01, Aharon Robbins wrote:
>
>> TinyCC is blindingly fast, and can compile gawk, but is broken in that
>> it won't diagnose duplicate case statements inside switch. The developers
>> don't consider this a problem. So I refuse to use it.
>>
>> In short, I'm looking for a faster compiler that actually works.
>
> I'm quite impressed with Tiny CC. I'd considered it a toy compiler that
> could only compile a small subset of the language until I actually tried it.
>
> And yes, on my Win64 version, it does seem to ignore duplicate 'case'
> labels.
>
> But TCC does have problem with compiling switch statements; although the
> code it generates isn't that great anyway, that for switches is slower.
>
> I suspect it just compiles switch as an if-else chain. Which might
> explain why duplicate case labels are ignored if that conversion is done
> at at early stage.
>
> (Because a duplicate condition in an if-else chain is fine. For a
> jump-table of course, a duplicate label is ambiguous so can't be allowed.)

A duplicate condition in an if-else chain is unreachable code, which is
probably a bug.

   if (foo) {
     ...
   } else if (foo) {
     /* unreachable */
   } else if (bar) {
     ...
   }

Duplicate cases are a constraint violation in ISO C, requiring a
diagnostic.

A duplicate label can also be just as "fine". Look, you can build a jump
table, and then just fill it without caring about duplicates:

  case 42:
  ...
  case 42:

No problem: just write to jumptable[42] twice, letting it the most
recently established address overrule the previous one.

Duplicate case label code can still be reachable, making the bugs
potentially more interesting:

  case 42: // suppose this is used
    ...
    break;
  case 19:
    ...
    /* fallthrough */
  case 42: // this isn't used, but fallthrough makes it reachable

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List:  http://www.kylheku.com/diy
ADA MP-1 Mailing List:   http://www.kylheku.com/mp1
0
Kaz
9/6/2016 2:15:52 AM
On 2016-09-06, alexfrunews@gmail.com <alexfrunews@gmail.com> wrote:
> On Sunday, September 4, 2016 at 12:02:33 PM UTC-7, Aharon Robbins wrote:
>> Can I get recommendations for other (free) C compilers besides GCC and CLANG?
>> I've been using the revived PCC for gawk development since it's faster
>> than GCC, but recently it's developed a bug where it won't compile the
>> current (valid) code.
>>
>> LCC seems to be 32 bit only and requires very manual configuration.
>>
>> TinyCC is blindingly fast, and can compile gawk, but is broken in that
>> it won't diagnose duplicate case statements inside switch. The developers
>> don't consider this a problem. So I refuse to use it.
>>
>> In short, I'm looking for a faster compiler that actually works.
>
> What is preventing you from doing most of development using, say, that
> same TinyCC and then, when you think you're done with the round of
> changes (bugfixes, improvements, new features) recompiling the coded
> with gcc or clang and rerunning the tests to make sure there are no
> issues (warnings, real bugs) missed because of the limitations of the
> fast compiler? Given the nature of C, I think, it's always beneficial
> to try your code with different compilers (and on different platforms)
> to uncover portability and other problems.

If TinyCC is lacking basic ISO-C-required diagnostics, it means you
basically cannot make a single git commit without running the code
through another compiler.

Or you have to to hoard up a bunch of commits, then validate with
the other compiler, then interactively rebase them to fix all the things
that TinyCC didn't catch, and then publish the commits.

Can you say, PITA?

You need a compiler with great diagnostics for development.  (More
than just the ISO C required ones, but those would be nice, for
starters).
0
Kaz
9/6/2016 2:18:04 AM
On 06/09/2016 03:15, Kaz Kylheku wrote:
> On 2016-09-05, BartC <bc@freeuk.com> wrote:

>> I suspect [TCC] just compiles switch as an if-else chain. Which might
>> explain why duplicate case labels are ignored if that conversion is done
>> at at early stage.
>>
>> (Because a duplicate condition in an if-else chain is fine. For a
>> jump-table of course, a duplicate label is ambiguous so can't be allowed.)
>
> A duplicate condition in an if-else chain is unreachable code, which is
> probably a bug.
>
>    if (foo) {
>      ...
>    } else if (foo) {
>      /* unreachable */
>    } else if (bar) {
>      ...
>    }

It's not necessarily unreachable. For example:

  if (foo()) { ...}
  elsif if (foo()) { ... }

foo() could return false the first time and true the second time.

In general detecting duplication conditions for if-else can involve
comparing expressions of arbitrary complexity, and it can be harder to
determine if they will always yield the same value.

(Unreachable code can also be made reachable by inserting a label with a
goto from elsewhere.)

> Duplicate cases are a constraint violation in ISO C, requiring a
> diagnostic.

> A duplicate label can also be just as "fine". Look, you can build a jump
> table, and then just fill it without caring about duplicates:
>
>   case 42:
>   ...
>   case 42:
>
> No problem: just write to jumptable[42] twice, letting it the most
> recently established address overrule the previous one.

Some duplicate cases might be fine:

  case 10: case 10:

as they both end up at the same place (and the 10s can be hidden behind
different macros or enums). But an error is still reported.

> Duplicate case label code can still be reachable, making the bugs
> potentially more interesting:
>
>   case 42: // suppose this is used
>     ...
>     break;
>   case 19:
>     ...
>     /* fallthrough */
>   case 42: // this isn't used, but fallthrough makes it reachable

That's another way for duplicate cases to be meaningful, except that you
can't control which of the case 42s is put into the jump table. Unless
(in this case) you only use the first, so writing to jumptable[42] once
not twice.

But the sensible thing for switch is to report all duplicates (with a
possible concession when labels are at the same location, although that
could mask an error).

--
Bartc
0
BartC
9/6/2016 8:35:30 PM
On 06/09/16 03:15, Kaz Kylheku wrote:
> A duplicate condition in an if-else chain is unreachable code, which is
> probably a bug.

	I note the "probably", but there are several ways this can happen
legitimately:

  -- Language translation.  Suppose we start with [in some other language]
something like
	IF letter = 'q' THEN ...
	  ELIF letter IN ['a', 'e', 'i', 'o', 'u'] THEN ...
	  ELIF letter IN ['a' : 'z'] THEN ...
	FI
and this gets turned automatically into either an "if-else" chain or a
bunch of cases in C, where 'q' and the vowels are tested twice.

  -- Similar things can happen via macro expansion.  What the coder sees
is often only loosely related to what later stages of compilation see!

  -- Debugging.  Eg, you don't want to deal with 'q' now, there are other
bugs you want to catch first.  So you write
	if (letter == 'q') { error (19); }
	else if ...
in front of the code, and the "old" case 'q' is now unreachable.

	Of course, deciding what constructs deserve warnings is a very
difficult problem in general, perhaps esp in languages like C where the
"coder" is often an automaton rather than a human.

--
Andy Walker,
Nottingham.
0
Andy
9/7/2016 4:01:56 PM
On Sunday, September 4, 2016 at 2:02:33 PM UTC-5, Aharon Robbins wrote:
> LCC seems to be 32 bit only and requires very manual configuration.

LCC-WIN 32 and apparently 64 by Jacob Navia appeared to have spruced
up LCC with elements drawn from Watcom to make it a bona fide Windows
C compiler; rather than one that piggybacks on another compiler's
library or the system library. It accepts a bit more than C; and Navia
has been trying to push it into a form of C sufficiently enhanced as
to make C++ unnecessary; while still firmly rooted in that which makes
C distinctively C. The one way you can tell however that it is not in
an active life cycle is that a large examples and demo archive that
has been separately posted for it by others has not undergone any
substantial change in almost a decade. LCC and LCC-WIN's main drawback
is that they are NOT doing the types of analyses that a good compiler
ought to be doing but appear to be more akin to byte-code compilers.
You can see it by the run times compared to GCC on Linux on the same
CPU.

0
rockbrentwood
9/8/2016 12:00:18 AM
On Tue, 6 Sep 2016 21:35:30 +0100, BartC <bc@freeuk.com> wrote:

>On 06/09/2016 03:15, Kaz Kylheku wrote:
>
>> A duplicate condition in an if-else chain is unreachable code, which is
>> probably a bug.
>>
>>    if (foo) {
>>      ...
>>    } else if (foo) {
>>      /* unreachable */
>>    } else if (bar) {
>>      ...
>>    }
>
>It's not necessarily unreachable. For example:
>
>  if (foo()) { ...}
>  elsif if (foo()) { ... }
>
>foo() could return false the first time and true the second time.

In such cases foo doesn't need to be function - it only needs to be a
[moral equivalent of a C] volatile to exhibit strange behavior.

Writing an IF chain with multiple identical tests probably should be a
style warning in and of itself.


>In general detecting duplication conditions for if-else can involve
>comparing expressions of arbitrary complexity, and it can be harder to
>determine if they will always yield the same value.

Yes.  It's infeasible in most circumstances.

George
0
George
9/8/2016 3:58:41 PM
On 2016-09-08, George Neuner <gneuner2@comcast.net> wrote:
> On Tue, 6 Sep 2016 21:35:30 +0100, BartC <bc@freeuk.com> wrote:
>
>>On 06/09/2016 03:15, Kaz Kylheku wrote:
>>
>>> A duplicate condition in an if-else chain is unreachable code, which is
>>> probably a bug.
>>>
>>>    if (foo) {
>>>      ...
>>>    } else if (foo) {
>>>      /* unreachable */
>>>    } else if (bar) {
>>>      ...
>>>    }
>>
>>It's not necessarily unreachable. For example:
>>
>>  if (foo()) { ...}
>>  elsif if (foo()) { ... }
>>
>>foo() could return false the first time and true the second time.
>
> In such cases foo doesn't need to be function - it only needs to be a
> [moral equivalent of a C] volatile to exhibit strange behavior.

Two occurrences of an impure expression aren't the same condition.

If a compiler wants to rearrange the code so that the result of such
an expression is to appear in multiple places (where in the original
source language, the corresponding expression appears just once), it
must introduce a temporary variable to hold the result, and propagate
the variable. Otherwise the multiple evaluations will lead to
unpleasant surprised.

   ;; poor: copies of code fragment (foo) literally proliferated

  (let ((code-fragment '(foo))
        (temp (gensym)))
    `(cond (,code-fragment ...)
           (,code-fragment ...) ;; not same; potentially reachable!
             ...)))

  ;; correct:

  (let ((code-fragment '(foo))
        (temp (gensym)))
    `(let ((,temp ,code-fragment)) ;; eval code fragment to temporary
       (cond (,temp ...) ;; proliferate temporary
             (,temp ...) ;; unreachable!
             ...)))

Anyway, this was originally about translating C switch statements, where
the conditions are pure expressions, being integer constants. "else if
(foo()) ..." is an unlikely translation of a standard C switch case.
0
Kaz
9/8/2016 10:54:13 PM
On Thu, 8 Sep 2016 22:54:13 +0000 (UTC), Kaz Kylheku <221-501-9011@kylheku.com> wrote:

>On 2016-09-08, George Neuner <gneuner2@comcast.net> wrote:
>> On Tue, 6 Sep 2016 21:35:30 +0100, BartC <bc@freeuk.com> wrote:
>>
>>>On 06/09/2016 03:15, Kaz Kylheku wrote:
>>>
>>>> A duplicate condition in an if-else chain is unreachable code, which is
>>>> probably a bug.
>>>>
>>>>    if (foo) {
>>>>      ...
>>>>    } else if (foo) {
>>>>      /* unreachable */
>>>>    } else if (bar) {
>>>>      ...
>>>>    }
>>>
>>>It's not necessarily unreachable. For example:
>>>
>>>  if (foo()) { ...}
>>>  elsif if (foo()) { ... }
>>>
>>>foo() could return false the first time and true the second time.
>>
>> In such cases foo doesn't need to be function - it only needs to be a
>> [moral equivalent of a C] volatile to exhibit strange behavior.
>
>Two occurrences of an impure expression aren't the same condition.

That certainly is true ... for some definition ... but in the examples
given by Bart and me, the compiler has no basis to distinguish the
occurrences.


>If a compiler wants to rearrange the code so that the result of such
>an expression is to appear in multiple places (where in the original
>source language, the corresponding expression appears just once), it
>must introduce a temporary variable to hold the result, and propagate
>the variable. Otherwise the multiple evaluations will lead to
>unpleasant surprised.
>
>   ;; poor: copies of code fragment (foo) literally proliferated
>
>  (let ((code-fragment '(foo))
>        (temp (gensym)))
>    `(cond (,code-fragment ...)
>           (,code-fragment ...) ;; not same; potentially reachable!
>             ...)))
>
>  ;; correct:
>
>  (let ((code-fragment '(foo))
>        (temp (gensym)))
>    `(let ((,temp ,code-fragment)) ;; eval code fragment to temporary
>       (cond (,temp ...) ;; proliferate temporary
>             (,temp ...) ;; unreachable!
>             ...)))
>

Also true, but irrelevent.  The compiler can't assume semantics not in
evidence.   It's job is to faithfully translate the source as written
.... not the source with better hygiene, and not the source as the
compiler thinks the programmer maybe intended.

Leaving aside for the moment what is legal in a C switch, if I tell
the compiler to call foo() multiple times, I expect it to do that.


>Anyway, this was originally about translating C switch statements, where
>the conditions are pure expressions, being integer constants. "else if
>(foo()) ..." is an unlikely translation of a standard C switch case.

But like any good Usenet discussion, it (d)evolved into something more
general.

Not every language with a CASE like construct has C's limitations on
specifying the alternatives.  Lisp certainly doesn't.   And it is
perfectly reasonable to implement the construct using an IF chain.

In fact, many (non-Lisp) compilers do implement CASE like constructs
using chained IF performing a binary search when the set of
alternatives is either sparse enough or large enough to make using a
dispatch table unweildy (or mostly empty).

George
0
George
9/9/2016 4:51:55 AM
On 2016-09-09, George Neuner <gneuner2@comcast.net> wrote:
> But like any good Usenet discussion, it (d)evolved into something more
> general.
>
> Not every language with a CASE like construct has C's limitations on
> specifying the alternatives.  Lisp certainly doesn't.

Yes, it does, in fact. The labels in Lisp case are literals that are
embedded in the CASE syntax itself, and not evaluated. They are
compared to the input value using EQL equality. Duplicate cases
are a bug, which is not required to be diagnosed.

CLISP:

[1]> (case 'a (a 1) (a 2))  ;; (a 2) unreachable
1
[2]> (compile nil (lambda () (case 'a (a 1) (a 2))))
WARNING: Duplicate CASE label A : (CASE 'A (A 1) (A 2))
#<COMPILED-FUNCTION NIL> ;
1 ;
NIL

> In fact, many (non-Lisp) compilers do implement CASE like constructs
> using chained IF performing a binary search when the set of
> alternatives is either sparse enough or large enough to make using a
> dispatch table unweildy (or mostly empty).

Still, duplicates which are static literals should be diagnosed,
even if computed labels are allowed that can produce duplicates
at run time, and that isn't diagnosed.

   case value of
     computed():
       ...
     computed(): # at compile time, we trust this is different
       ...
     3:
       ...
     3: # obviously clashing, diagnose damn thing!
   endcase
0
Kaz
9/9/2016 6:05:44 AM
On 06/09/2016 01:26, alexfrunews@gmail.com wrote:
> On Sunday, September 4, 2016 at 3:56:56 PM UTC-7, jacobnavia wrote:
>> Why should they care? If you write duplicate switch cases its not their
>> fault, its yours.
>
> It is a constraint violation, and a trivial one, that the compiler
> must identify and report instead of silently producing code that is
> broken or appears to work by chance.

This is C. There are already plenty of things that it lets through that
are probably errors, and which are likely to crash the program.

For example, take a pointer to an array of ints, P. You'd normally
access an int by dereferencing P then applying the index: (*P)[i].

But if, by mistake, you index first then dereference: *(P[i]), then this
still compiles. But is likely to go wrong badly.

By contrast, an inadvertent duplicate case label is a 'soft' error; the
program just won't give the expected results. It's not going to read or
write all over memory it shouldn't, or at least not as directly as will
happen if the compiler blatantly allows pointers (to single targets) and
arrays to be interchanged.

--
Bartc
0
BartC
9/9/2016 11:31:17 AM
On 2016-09-09, BartC <bc@freeuk.com> wrote:
> On 06/09/2016 01:26, alexfrunews@gmail.com wrote:
>> On Sunday, September 4, 2016 at 3:56:56 PM UTC-7, jacobnavia wrote:
>>> Why should they care? If you write duplicate switch cases its not their
>>> fault, its yours.
>>
>> It is a constraint violation, and a trivial one, that the compiler
>> must identify and report instead of silently producing code that is
>> broken or appears to work by chance.
>
> This is C. There are already plenty of things that it lets through that
> are probably errors, and which are likely to crash the program.

That's true. The major real-world consequence of not producing
an ISO-C required diagnostic is that the code will fail to compile
on compilers other than the one with which it was developed.

Though you may have some dead code in a switch statement, and that could
be a bug, on the other hand, you test the code.  It's not a bug if there
is no external test case which reproduces an issue.

And that brings us to Aharon's original point about looking for
a compiler that can be used for *development*.  A compiler used for
development should have excellent diagnostics, which helps make the code
cleanly portable to other compilers.

> By contrast, an inadvertent duplicate case label is a 'soft' error; the
> program just won't give the expected results.

Or, as I argue about, there might in fact be no bug: the correct fix
might in fact be just to remove the duplicate label and all the code
under it.

Some ISO C required diagnostics do help uncover real errors, which
helps, in spite of the C language having lots of pitfalls and holes.

Imagine if you could assign any scalar type to any other without needing
a cast, and with no diagnostic: double x = "abc", and so on.
Or call functions without any parameter checking.

The type checking in C does a lot for improving the reliability of C
programs, even though it falls far short of repairing all the holes
and pitfalls.
0
Kaz
9/9/2016 4:06:54 PM
On Fri, 9 Sep 2016 06:05:44 +0000 (UTC), Kaz Kylheku
<221-501-9011@kylheku.com> wrote:

>On 2016-09-09, George Neuner <gneuner2@comcast.net> wrote:
>> But like any good Usenet discussion, it (d)evolved into something more
>> general.
>>
>> Not every language with a CASE like construct has C's limitations on
>> specifying the alternatives.  Lisp certainly doesn't.
>
>Yes, it does, in fact. The labels in Lisp case are literals that are
>embedded in the CASE syntax itself, and not evaluated. They are
>compared to the input value using EQL equality. Duplicate cases
>are a bug, which is not required to be diagnosed.

Lisp does not restrict CASE "labels" to integers like C.  They can be
any of [at least] symbols, characters, integers or booleans.

And unlike C where booleans and characters are subsets of integer, in
Lisp they are discrete types.  Additionally symbols are a reference
type - a pointer to a structure.

So no!  Lisp does not have C's limitations.

George
[Lisp does restrict case labels to literals, so if two of them
are the same, that's a bug. -John]
0
George
9/9/2016 6:02:16 PM
On 2016-09-09, George Neuner <gneuner2@comcast.net> wrote:
> On Fri, 9 Sep 2016 06:05:44 +0000 (UTC), Kaz Kylheku
><221-501-9011@kylheku.com> wrote:
>
>>On 2016-09-09, George Neuner <gneuner2@comcast.net> wrote:
>>> But like any good Usenet discussion, it (d)evolved into something more
>>> general.
>>>
>>> Not every language with a CASE like construct has C's limitations on
>>> specifying the alternatives.  Lisp certainly doesn't.
>>
>>Yes, it does, in fact. The labels in Lisp case are literals that are
>>embedded in the CASE syntax itself, and not evaluated. They are
>>compared to the input value using EQL equality. Duplicate cases
>>are a bug, which is not required to be diagnosed.
>
> Lisp does not restrict CASE "labels" to integers like C.  They can be
> any of [at least] symbols, characters, integers or booleans.
> And unlike C where booleans and characters are subsets of integer, in
> Lisp they are discrete types.  Additionally symbols are a reference
> type - a pointer to a structure.

At the low level it's fairly similar because EQL compares most of
those types as machine words, with the exception of integers
(two distinct bignum objects denoting the same integer are EQL).

If none of the case labels are bignums (case's usual case), then EQ can
be used, and then boils down to a switch on machine words.

All the same tricks can be used: jump table, tree-search.
0
Kaz
9/9/2016 7:57:14 PM
BartC schrieb:

> For example, take a pointer to an array of ints, P. You'd normally
> access an int by dereferencing P then applying the index: (*P)[i].
>
> But if, by mistake, you index first then dereference: *(P[i]), then this
> still compiles. But is likely to go wrong badly.

AFAIR (according to K&R) p[i] is equivalent to *(p+i), only a different
syntax for the same semantics. In contrast to C++ the C language doesn't
include arrays as special types, proper usage is up to the coder.

> By contrast, an inadvertent duplicate case label is a 'soft' error; the
> program just won't give the expected results. It's not going to read or
> write all over memory it shouldn't, or at least not as directly as will
> happen if the compiler blatantly allows pointers (to single targets) and
> arrays to be interchanged.

A C switch statement can be translated in various ways, depending on
the sparseness of the values and other conditions. A good compiler
choses the most efficient model for every single statement. At least
the linker should bark on duplicate compiler-generated labels.


Some more thoughts:

I'd think that everybody in this thread should specify the C standard
he relies on. There is a broad range from K&R to C99, or whatever is
the current Ansi standard. But I guess none of the contributors ever
spent the money and time to read a standard, where many "undefined"
and "compiler specific" cases are described in detail. It's a bad idea
to rely on the (mis)behaviour of some specific compiler, in a
discussion about the semantics of the C language.


In former times it was common practice to use multiple C compilers, one
for good diagnostics, one for good debugging features, and one for most
efficient executable code. At that time the Watcom compiler was said to
be the most reliable compiler, dunno about todays version (Open Watcom).

The old Microsoft compilers were poor in diagnostics and efficiency,
every official sample program contained more than 50 bugs and flaws,
which were found by the Borland (BC...) compiler after removing excess
(almost wrong) type casts. But it's not the compiler to blame, instead
it's the coder that disallows the compiler to perform essential checks,
and disables or ignores compiler warnings. It's bad practice to force
the compiler to accept some code, with the poor excuse "else it doesn't
do what I want". Who wants better diagnostics should consider to use C++
instead of C, and make use of the improved syntax to let the compiler
perform more checks.

Even newer MS compilers tend to follow their own language definition,
not any Ansi standard. Other compilers, like gcc, allow to specify the
standard to apply, eventually to enable/disable compiler specific
extensions of the language. Any statement about compiler specific
behaviour is meaningless without the specification of the compiler and
option settings.

DoDi
0
Hans
9/9/2016 10:38:31 PM
In article <16-09-001@comp.compilers>, Aharon Robbins <arnold@skeeve.com> wrote:
>Can I get recommendations for other (free) C compilers besides GCC and CLANG?
>I've been using the revived PCC for gawk development since it's faster
>than GCC, but recently it's developed a bug where it won't compile the
>current (valid) code.
>
>LCC seems to be 32 bit only and requires very manual configuration.
>
>TinyCC is blindingly fast, and can compile gawk, but is broken in that
>it won't diagnose duplicate case statements inside switch. The developers
>don't consider this a problem. So I refuse to use it.
>
>In short, I'm looking for a faster compiler that actually works.

Thanks for all the replies. Some short replies to the relevant
answers:

To Jacob Navia - I'm sorry it's rough selling C compilers. I understand
your complaints. As others pointed out, duplicate cases is a constraint
violation.  Why would I used a compiler that didn't catch missing
semicolons or allowed only two expressions in a for(;;) ?  Same thing
applies to case statements.

As Kaz further pointed out, you can't just compile with it and then
recompile with gcc before committing. Big PITA, and I actually ended
up one time making a bad commit because tcc didn't catch duplicate cases.

Nemo - Intel and Oracle compilers - I'm sure they're good, but I'm
looking for FAST compilation. I often do 'make distclean' and then
configure and make many times over in one session. Fast compiles
make a big difference. I'd use GCC or clang for anything I wanted to
install, but while developing, speed is wonderful, but correctness
beats it.

Kaz suggested C++ - that's a lot of work for my code base. It might
get me more type checking but it won't get me compilation speed. C++
compilers can be slower than C compilers.

Florian - golang compilers. Worth checking out.

Privately, I was pointed at nwcc, but it fails to compile gawk.

Privately by someone else, I was suggested to just fix tcc myself.
It turns out that I was able to add duplicate case checking in about
an hour's work and just under 100 lines of code. But I have the
background and experience for that; I suspect someone else would
have had a harder time.

I will see about trying to get the upstream to accept it. And I hope
that PCC will eventually get fixed too.

Thanks all,

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
0
arnold
9/12/2016 6:42:05 PM
On 12/09/2016 19:42, Aharon Robbins wrote:

> Privately by someone else, I was suggested to just fix tcc myself.
> It turns out that I was able to add duplicate case checking in about
> an hour's work and just under 100 lines of code. But I have the
> background and experience for that; I suspect someone else would
> have had a harder time.

Do you use TCC to compile itself, or something like gcc?

I'm just interested in whether it partly relies on gcc to get its speed!

--
Bartc

0
BartC
9/12/2016 9:37:12 PM
In article <16-09-028@comp.compilers>, BartC  <bc@freeuk.com> wrote:
>On 12/09/2016 19:42, Aharon Robbins wrote:
>> Privately by someone else, I was suggested to just fix tcc myself.
>> It turns out that I was able to add duplicate case checking in about
>> an hour's work and just under 100 lines of code. But I have the
>> background and experience for that; I suspect someone else would
>> have had a harder time.
>
>Do you use TCC to compile itself, or something like gcc?
>
>I'm just interested in whether it partly relies on gcc to get its speed!

The default configure + make compiles tcc with gcc.
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com

0
arnold
9/13/2016 5:35:20 PM
>BartC  <bc@freeuk.com> wrote:
>>Do you use TCC to compile itself, or something like gcc?
>>
>>I'm just interested in whether it partly relies on gcc to get its speed!

Aharon Robbins <arnold@skeeve.com> wrote:
>The default configure + make compiles tcc with gcc.

I should point out that a large part of tcc's speed comes from the
fact that it generates object code directly, instead of going
through the assembler.  It might even have its own internal 'ld',
I'm not sure.
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
[Oh, like WATFOR and the original Dartmough BASIC. -John]
0
arnold
9/14/2016 6:14:22 PM
Hi,

On Sunday, September 4, 2016 at 2:02:33 PM UTC-5, Aharon Robbins wrote:
>
> Can I get recommendations for other (free) C compilers besides
> GCC and CLANG?
>
> In short, I'm looking for a faster compiler that actually works.

I'm not sure what miracle advice you expected to get from here.

The first thing you should do is check your build system. Maybe there
isn't much leeway there, but you should definitely something better
(simpler?) than Autotools. Try using Dash instead of Bash. Also, at
the very least, you should make your C modules as small as possible
and only use expensive optimizations on crucial files (instead of
blindly using "-O2" for everything).

If modern GCC is a bottleneck, use an older version from an old
distro (in a VM perhaps). If that isn't good advice, then try
to do rebuilds atop RAM disk (md or whatever *nix calls it).

What distro/GCC are you using anyways? Surely you've heard of
"-Og". (If you have a semi-recent Intel machine, try Clear Linux.)

Have you ever tried ccache or distcc? I haven't, but I assume that
something like that would help. I'm sure this has been attempted
before.

Honestly, I did a very naive build of Gawk myself (under old Lucid
Puppy Linux, admittedly not hosted in RAM but on hard disk). It
didn't seem that slow, but I was using "make -j4". So the obvious
question is how many cores does your cpu have? Are you using an
old laptop? Seriously, I hate advice like this, but you may honestly
get more gains from getting a newer machine with more cores. Heck,
some server machines these days have 20+ (although I admit that most
home versions don't have nearly that many).

Can't you use GNU's build server or something? Maybe I'm wrong, I
don't know the details, but I would assume they have something you
can ssh into for fast builds. Or maybe that's only for release
testing and not development?

I dunno, this kind of problem is very complex and naive at the same
time.
0
rugxulo
9/26/2016 10:29:09 PM
Hi. Thanks for your note.

In article <16-09-033@comp.compilers>,  <rugxulo@gmail.com> wrote:
>Hi,
>
>On Sunday, September 4, 2016 at 2:02:33 PM UTC-5, Aharon Robbins wrote:
>>
>> Can I get recommendations for other (free) C compilers besides
>> GCC and CLANG?
>>
>> In short, I'm looking for a faster compiler that actually works.
>
>I'm not sure what miracle advice you expected to get from here.

I develop locally. My system is Ubuntu 16.04 with gcc 5.4 on a Skylake
Core i5. /proc/cpuinfo claims 4 CPUs, so there are likely two hyperthreaded
physical cores and there's plenty of RAM.  The machine is quite fast,
even though it's using a conventional disk.

I often have to go through this cycle:

	make distclean
	./bootstrap.sh	# set modification times on some files
	./configure && make && make check

For example, when merging into the (too-many) different branches
and encountering a conflict.

The time difference between using tcc on the one hannd and GCC + make
-j on the other is quite noticeable; tcc + make -j is even faster.
Doing many builds an hour can happen, and a faster compiler saves me time.
tcc also makes a VERY noticeable difference in the time it takes
to run configure.

When I posted, tcc wasn't an option since it didn't check for duplicate
case labels, and PCC had stopped working for me.  So I was seeking an
additional, fast compiler.

Dorking with the Makefile to only use -O on certain files isn't really
an option; autotools sets things up to compile everything the same way
and for the shipped tarball that is the right option.

Similarly, I don't really wish to switch off the autotools; I've too
much time and experience invested in them. They work, crufty as they are,
and I have not had to invest any real time in keeping things up to date
with respect to them.

Since I originally posted, someone suggested that I just fix tcc on
my own. I was able to do this with less than 2 hours work so now
I'm back to being fat, dumb and happy. :-)

Thanks,

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
0
arnold
9/27/2016 5:40:55 AM
Hi,

On Tuesday, September 27, 2016 at 1:36:38 PM UTC-5, Aharon Robbins wrote:
>
> In article <16-09-033@comp.compilers>,  <rugxulo@hates.spam> wrote:
> >
> >On Sunday, September 4, 2016 at 2:02:33 PM UTC-5, Aharon Robbins wrote:
> >>
> >> Can I get recommendations for other (free) C compilers besides
> >> GCC and CLANG?
> >>
> >> In short, I'm looking for a faster compiler that actually works.

.... that runs on (and targets?) Linux/AMD64 with ELF, C99/POSIX,
widechar/Unicode, etc.

There are a lot of other compilers (and OSes), but they can't all
support literally everything.

> >I'm not sure what miracle advice you expected to get from here.
>
> I develop locally. My system is Ubuntu 16.04 with gcc 5.4 on a
> Skylake Core i5. /proc/cpuinfo claims 4 CPUs, so there are likely
> two hyperthreaded physical cores and there's plenty of RAM.
> The machine is quite fast, even though it's using a conventional
> disk.

Four cpus is still good, better than average. (But even my old
Westmere has that. So where's our "thousands of cores", Intel??
They sure do overhype SMP on every street corner! Humbug!)

And Skylake is very new. (I can't even pretend to remember all
the supported instruction sets.)

Honestly, I'd have been almost surprised if someone like you didn't
already use a "modern" (ugh) setup like this.

I'm not really familiar with tmpfs, but you can presumably use it
to speed some things up.

There was a small regression in GCC 5.x, so I think 6.x is indeed
slightly faster, but it's probably not quite the speedup you wanted.

> I often have to go through this cycle:
>
> 	make distclean
> 	./bootstrap.sh	# set modification times on some files
> 	./configure && make && make check
>
> For example, when merging into the (too-many) different branches
> and encountering a conflict.

The bottleneck could really be something simple like the makefile
itself, but I realize that rewriting or replacing that isn't usually
easy.

> The time difference between using tcc on the one hand and GCC + make
> -j on the other is quite noticeable; tcc + make -j is even faster.
> Doing many builds an hour can happen, and a faster compiler saves me time.
> tcc also makes a VERY noticeable difference in the time it takes
> to run configure.

I hate all that configure stuff (as I'm not really *nix savvy), but
can't you make a site-wide cache of certain things (that you know
will absolutely never change)?

http://www.gnu.org/software/automake/manual/html_node/config_002esite.html

> When I posted, tcc wasn't an option since it didn't check for duplicate
> case labels, and PCC had stopped working for me.  So I was seeking an
> additional, fast compiler.

Regarding PCC, you act like it just started to break. How exactly?
This is why you never upgrade anything unless forced!  :-)
So just stick with (stable) older version there (or is that
Debian or Ubuntu's fault?).

TCC hasn't had a release in recent years either. It's fast because
it does everything in one pass, and it does include it's own
assembler and linker. It doesn't optimize well, though.

I guess you know that GCC has slowed down tremendously since the
old days (2.7.x vs. 2.95.x vs. 3.x). Of course, using Skylake,
you shouldn't have any excuse (except binaries compiled for weaker
targets). It may not be reasonable, but you really should give
ClearLinux a shot.

> Dorking with the Makefile to only use -O on certain files isn't really
> an option; autotools sets things up to compile everything the same way
> and for the shipped tarball that is the right option.

If you're going to let AutoTools (or anything else) dictate everything
for you, then you're going to have to live with slow compiles.

> Similarly, I don't really wish to switch off the autotools; I've too
> much time and experience invested in them. They work, crufty as they are,
> and I have not had to invest any real time in keeping things up to date
> with respect to them.

Great, but if that's your bottleneck, then you'll have to fix it.

> Since I originally posted, someone suggested that I just fix tcc on
> my own. I was able to do this with less than 2 hours work so now
> I'm back to being fat, dumb and happy. :-)

There's always room for improvement (overall).
0
rugxulo
9/27/2016 11:36:29 PM
On Tue, 27 Sep 2016 05:40:55 -0000 (UTC), arnold@skeeve.com (Aharon
Robbins) wrote:

>I develop locally. My system is Ubuntu 16.04 with gcc 5.4 on a Skylake
>Core i5. /proc/cpuinfo claims 4 CPUs, so there are likely two hyperthreaded
>physical cores and there's plenty of RAM.

The i5 has 4 non-HT cores.


>The time difference between using tcc on the one hand and GCC + make
>-j on the other is quite noticeable; tcc + make -j is even faster.
>Doing many builds an hour can happen, and a faster compiler saves me time.
>tcc also makes a VERY noticeable difference in the time it takes
>to run configure.

If you really have plenty of RAM, why not create a ramdisk?   Or put
in an SSD for the tmp filesystem?


>Since I originally posted, someone suggested that I just fix tcc on
>my own. I was able to do this with less than 2 hours work so now
>I'm back to being fat, dumb and happy. :-)

As long as you don't expect the same level of optimization.  TCC is
nice for quick development turn-around, but it doesn't produce the
fastest code.

George
0
George
9/28/2016 5:01:10 AM
On 28/09/2016 00:36, rugxulo@gmail.com wrote:

> TCC hasn't had a release in recent years either. It's fast because
> it does everything in one pass, and it does include it's own
> assembler and linker. It doesn't optimize well, though.

Compilers can be fast anyway. I think we've just got used to very slow
ones such as gcc. I've been using my own /interpreted/ compiler and it
was still double the speed of gcc!

I was reminded of how fast they can be when I recently developed a new
byte-code compiler, and managed up to a million lines per second
compilation speed (one test was 1.5Mlps). And that was on a low-end
5-year-old PC, running on a single core.

That was two passes; a compiler to native code would use an extra pass,
and probably the code-generating is a bit fiddlier. But I would estimate
source to in-memory native-code generation at at least 500K lines per
second on the same machine, if I was to give that compiler the same
treatment.

(And because this is not for C, I can use all of that throughput instead
of wasting it repeatedly compiling the same header files.)

However it is so fast that it would be necessary to consider carefully
what to do with the output: invoking an external assembler or linker
would be like hitting a brick wall. So it would need to generate an
entire executable directly or prepare code to run in-memory.

These fast compilers (the byte-code one I've completed, and the possible
native code one) will only compile an entire project at once. (Because
of dependencies, parallelising would be trickier. But at the minute it's
fast enough: the new byte-code compiler can entirely re-compile my
current native-code compiler from scratch in 0.03 seconds.)

But with C projects then yes, the range of fast tools is limited. It
doesn't help if you also become dependent on things such as 'configure'
or 'make' (I work in Windows where there is less of that).

Maybe it needs people with experience of fast graphics or rendering, who
know how to speed things up, and apply them to C compilers! The C
language does put some obstacles in the way (the same header needs to
processed for the 50th time in case something comes out different), but
I think there is plenty that can be done.

TCC does a good job, but it's a shame about the code generation. (My own
native code compiler has a naive non-optimising code generator but is
still much better than TCC's code.)

(One one test of the fast compiler, my own code can manage 800Klps.
Converting to C intermediate code then putting it through gcc -O3 gets
it up to 1Mlps. But compiling with TCC gets it down to 350Klps (TCC is
not good with switch statements). Not so bad, but still ...)

--
Bartc
[Back when I was publishing the Journal of C Language of Translation, people did some interesting
stuff with C header files, saving a precompiled version for the usual case that subsequent
runs don't change any preprocessor stuff that would affect the code. -John]
0
BartC
9/28/2016 5:16:54 PM
On Tue, 27 Sep 2016 16:36:29 -0700 (PDT), rugxulo@gmail.com wrote:

>Four cpus is still good, better than average. (But even my old
>Westmere has that.

True.  And newer isn't always better.  Recently Intel has been focused
more on power consumption than on performance.  You can have more
cores, but they run slower.

>So where's our "thousands of cores", Intel??
>They sure do overhype SMP on every street corner! Humbug!)

You can have 72 weak cores in the Xeon Phi 7290.
http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

Or you can have 24 strong cores in the E7-8890
http://www.intel.com/content/www/us/en/processors/xeon/xeon-processor-e7-family.html


Depends on how much you want to pay.
<grin>


>And Skylake is very new. (I can't even pretend to remember all
>the supported instruction sets.)

Skylake has been generally available for more than a year.  Its
successor, Kaby Lake, is already available in selected [high priced]
systems and is expected to be generally available by the holidays this
year.


>The bottleneck could really be something simple like the makefile
>itself, but I realize that rewriting or replacing that isn't usually
>easy.
>
>   :
>
>> Dorking with the Makefile to only use -O on certain files isn't really
>> an option; autotools sets things up to compile everything the same way
>> and for the shipped tarball that is the right option.
>
>If you're going to let AutoTools (or anything else) dictate everything
>for you, then you're going to have to live with slow compiles.

There likely is opportunity to improve the make file, but slow builds
with GCC generally are the result of design patterns that produce
[too] many small source files, and/or coding patterns that prevent use
of precompiled headers [there are a LOT of caveats there].
https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html

George
0
George
9/29/2016 1:24:16 AM
In article <16-09-036@comp.compilers>,
George Neuner  <gneuner2@comcast.net> wrote:
>The i5 has 4 non-HT cores.

Works for me. :-)

>If you really have plenty of RAM, why not create a ramdisk?   Or put
>in an SSD for the tmp filesystem?

I spent enough $$ for the laptop and it's fast enough in general. I
don't want to buy an SSD or move my work directory off the hard disk
in case of power failures or whatever.

>>Since I originally posted, someone suggested that I just fix tcc on
>>my own. I was able to do this with less than 2 hours work so now
>>I'm back to being fat, dumb and happy. :-)
>
>As long as you don't expect the same level of optimization.  TCC is
>nice for quick development turn-around, but it doesn't produce the
>fastest code.

I am exactly interestsed in "quick development turn-around". :-)

Thanks,

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
0
arnold
9/29/2016 2:55:20 AM
> The whole point of "make" is to avoid recompiling things that
> do not need to be recompiled. By repeatedly doing "make distclean"
> you are circumventing this system.
>
> Find out why make is not compiling everything that it needs to.

I am purposely cleaning up the directory so that builds occur from
scratch as I *** switch between branches *** in the git repo. Especially on
the 'stable' and 'master' branches I have to be sure that anyone
checking out those branches will have things work for them.

I am thinking this discussion has gotten out of hand and I apologize
for it.  All I'm looking for is a fast compiler for development
since I on purpose do a lot of configure + make from scratch.
I don't care about code quality; just that 'make check' passes.

Since posting, I have repaired tcc to the point where I can use it,
so I'm happy. The PCC issue remains to be resolved; gawk's code changed
(the dfa module from grep) and PCC stopped working. I will eventually
go to the trouble of isolating the problem but it's probably a few
hours' work and that's time I don't have right now.

Thanks to everyone for their replies.

Arnold
0
arnold
9/29/2016 11:30:57 AM
On 28/09/2016 18:16, BartC wrote:
> On 28/09/2016 00:36, rugxulo@gmail.com wrote:
>
>> TCC hasn't had a release in recent years either. It's fast because
>> it does everything in one pass, and it does include it's own
>> assembler and linker. It doesn't optimize well, though.

> (And because this is not for C, I can use all of that throughput instead
> of wasting it repeatedly compiling the same header files.)

> These fast compilers (the byte-code one I've completed, and the possible
> native code one) will only compile an entire project at once.

> The C
> language does put some obstacles in the way (the same header needs to
> processed for the 50th time in case something comes out different), but
> I think there is plenty that can be done.
>
> TCC does a good job, but it's a shame about the code generation. (My own
> native code compiler has a naive non-optimising code generator but is
> still much better than TCC's code.)
>
> (One one test of the fast compiler, my own code can manage 800Klps.
> Converting to C intermediate code then putting it through gcc -O3 gets
> it up to 1Mlps. But compiling with TCC gets it down to 350Klps (TCC is
> not good with switch statements). Not so bad, but still ...)

> [Back when I was publishing the Journal of C Language of Translation,
> people did some interesting
> stuff with C header files, saving a precompiled version for the usual
> case that subsequent
> runs don't change any preprocessor stuff that would affect the code. -John]

Yes, George Neuner's post has a link to how gcc's precompiled headers
can be used. Maybe, if a compiler is already sluggish, they can make a
difference.

But I don't know if they would help to get the fastest speeds: loading
and decoding a precompiled header file doesn't sound that much less work
to me than just parsing a normal header anyway.

Most C compilers seem to support multiple C source files as input. This
is similar to my scheme of compiling multiple sources one after the
other. So if the input to a compiler looks like this:

    gcc -c A.c B.c C.c

where each of A, B and C include the same header file H.h, then once H.h
has been processed for A.c, the results of that (symbol tables and so
on) could be re-used for B and C without needing to process either H.h
or H.pch again.

Of course, a compiler can be so slow, and/or a header can be so large,
than even processing H.h /once/ can be a bottleneck! Then maybe
precompiled headers might be an easier option than making the compiler
itself faster, which is never going to happen in the case of gcc.

(I can tokenise C source code at some 10M lines per second on my PC
(this excludes symbol table lookups; just raw tokenising). But gcc might
process the same source at only 10K lines per second, even excluding
dealing with headers.

That's around a thousand times slower. People like to excuse it by
pointing to its superior code generation as the reason, but I think
that's only a small part of it. For a start, you can turn off
optimisation and it's still pretty slow. I think it's just too big and
complex.)

--
Bartc
[Tokenizing 10M lines/sec is pretty impressive.  In compilers that don't do heavy optimization
the lexer is usually the slowest part since it's the only thing that has to touch each
character of the source code individually. -John]
0
BartC
9/29/2016 1:03:55 PM
On Thu, 29 Sep 2016 14:03:55 +0100, BartC <bc@freeuk.com> wrote:

>Most C compilers seem to support multiple C source files as input. This
>is similar to my scheme of compiling multiple sources one after the
>other. So if the input to a compiler looks like this:
>
>    gcc -c A.c B.c C.c
>
>where each of A, B and C include the same header file H.h, then once H.h
>has been processed for A.c, the results of that (symbol tables and so
>on) could be re-used for B and C without needing to process either H.h
>or H.pch again.

But it isn't done that way: the compilation of each .c file is treated
independently.  A common header file will be (re)processed for each
code file that includes it.

Also recall that the standard headers form a nesting hierarchy: some
of the more common headers indirectly pull in many others.  Nesting
also is common in wizard generated code, and in C++ template
libraries, etc.


A pre-compiled header contains already digested data that can be
plugged (more or less) directly into the compiler's internal data
structures.  Loading a .pch file very often is much faster than
(re)compiling its constituent source files, and the time savings grow
with the number of files involved.

It works best when you have many code files all of which include a
common *set* of headers.  You extract the common includes from the
code files into a single common header file.  In each code file you
then include that one common header.  Then precompile it.

For a project with just a handful of files it really doesn't matter
.... but with complex projects with hundreds of source files I've
gotten ~80% reductions in build time by using precompiled headers.  I
have seen 45 minute clean builds turn into <10 minutes.

George
0
George
9/30/2016 2:57:49 AM
BartC schrieb:

> (I can tokenise C source code at some 10M lines per second on my PC
> (this excludes symbol table lookups; just raw tokenising). But gcc might
> process the same source at only 10K lines per second, even excluding
> dealing with headers.

IMO it's not the header files, which slow down compilation, but the
preprocessor macros which require to look up and optionally expand every
token. Almost every language has to allow for external references, which
are read from some shared file. Next comes the amount of declarations in
the standard C header files, which require much memory and can cause
swapping, even if only a very small subset of all declarations is
actually used in source code. In so far I don't think that it's fair or
meaningful to compare a full blown compiler with a bare tokenizer.

DoDi
0
Hans
9/30/2016 3:03:53 AM
In article <16-09-041@comp.compilers>,  <arnold@skeeve.com> wrote:
>I am thinking this discussion has gotten out of hand and I apologize
>for it.  All I'm looking for is a fast compiler for development
>since I on purpose do a lot of configure + make from scratch.
>I don't care about code quality; just that 'make check' passes.

Just to round off, here are some concrete timings:

configure with GCC
real	0m9.203s
user	0m3.752s
sys	0m0.972s


make - serial - GCC
real	0m17.806s
user	0m15.984s
sys	0m0.528s


make -j with GCC
real	0m8.148s
user	0m27.996s
sys	0m0.808s


configure with tcc
real	0m5.915s
user	0m0.832s
sys	0m0.424s


serial make with tcc
real	0m2.580s
user	0m1.540s
sys	0m0.180s

make -j with tcc
real	0m1.107s
user	0m2.544s
sys	0m0.236s

Very real compile time differences for clean builds. GCC from scratch
is 18 seconds, tcc from scratch is around 7. Over 200% difference!

Thanks again to everyone.

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com
0
arnold
9/30/2016 10:13:55 AM
On 30/09/2016 04:03, Hans-Peter Diettrich wrote:
> BartC schrieb:
>
>> (I can tokenise C source code at some 10M lines per second on my PC ...
>
> IMO it's not the header files, which slow down compilation, but the
> preprocessor macros which require to look up and optionally expand every
> token. ...

> In so far I don't think that it's fair or
> meaningful to compare a full blown compiler with a bare tokenizer.

I seem to remember some comment in this group that tokenising accounts
for a big chunk of a compiler's runtime (50% or something).

While it is true that doing a full compile will take longer than just
raw tokenising, should that factor be of the order of 1000 times longer,
or three magnitudes?

I was investigating whether a reasonable working compiler could be
developed working at between 1 and 2 magnitudes slow-down from the raw
tokenising speed.

Probably one magnitude is a little optimistic, for C source anyway (with
preprocessing and stuff also needed), but two magnitudes is easily on
the cards. I think Tiny C is within that range.

--
Bartc
0
BartC
9/30/2016 11:00:09 AM
On 29/09/2016 14:03, BartC wrote:

> [Tokenizing 10M lines/sec is pretty impressive.  In compilers that don't
> do heavy optimization
> the lexer is usually the slowest part since it's the only thing that has
> to touch each
> character of the source code individually. -John]

This is what I was putting to the test. Actually I struggled to recreate
that benchmark, but in the end managed to process actual C source (a
monolithic file containing all CPython sources) at some 9.7Mlps. Figures
do depend on the source style.

On another desktop (also cheap, but with Intel rather than AMD), it
managed 11.5Mlps. With non-C sources, which has less 'busy' syntax, up
to nearly 13Mlps.

Figures exclude file-loading (although that made little difference on
the second machine).

Probably some more throughput can be squeezed by applying some ASM (or
by writing in native C and using more options rather than using
atrocious-looking intermediate code), but at the moment this isn't a
bottleneck.

(This test has some bits missing, for example it parses floating point
numbers but doesn't convert the character sequences to actual values,
but on the input I used, floating point figures very little.

On the 1Mlps I quoted for an actual working compiler (although for
in-memory byte-code), that includes file-loading but expects the OS to
have cached the file as will normally be the case.

This compiler is case-insensitive which slows down the tokeniser a tiny
bit. That's one tiny advantage of compiling C!)

--
Bartc
0
BartC
9/30/2016 11:28:50 AM
BartC schrieb:
> On 30/09/2016 04:03, Hans-Peter Diettrich wrote:
>> BartC schrieb:
>>> (I can tokenise C source code at some 10M lines per second on my PC ...
>>
>> IMO it's not the header files, which slow down compilation, but the
>> preprocessor macros which require to look up and optionally expand every
>> token. ...
>
>> In so far I don't think that it's fair or
>> meaningful to compare a full blown compiler with a bare tokenizer.
>
> I seem to remember some comment in this group that tokenising accounts
> for a big chunk of a compiler's runtime (50% or something).

This seems to be a reasonable figure for C, including all those nasty
tasks which have to be done before a token can be passed on to the parser.

> While it is true that doing a full compile will take longer than just
> raw tokenising, should that factor be of the order of 1000 times longer,
> or three magnitudes?

Find out yourself. Replace the grammar of your tokenizer by the C
grammar, and test again. I'd wonder if it would not reach the speed of
your tokenizer. Then add the preprocessor with file inclusion, macro
definition, recognition and expansion, conditional compilation, and test
again. Then add Ansi and Unicode string literals and symbol tables, and
test again.

DoDi
0
Hans
9/30/2016 6:15:30 PM
On 30/09/2016 19:15, Hans-Peter Diettrich wrote:
> BartC schrieb:
>> I seem to remember some comment in this group that tokenising accounts
>> for a big chunk of a compiler's runtime (50% or something).
>
> This seems to be a reasonable figure for C, including all those nasty
> tasks which have to be done before a token can be passed on to the parser.
>
>> While it is true that doing a full compile will take longer than just
>> raw tokenising, should that factor be of the order of 1000 times longer,
>> or three magnitudes?
>
> Find out yourself.

I already know the answer. It /doesn't/ take that much longer.

  Replace the grammar of your tokenizer by the C
> grammar, and test again. I'd wonder if it would not reach the speed of
> your tokenizer. Then add the preprocessor with file inclusion, macro
> definition, recognition and expansion, conditional compilation, and test
> again. Then add Ansi and Unicode string literals and symbol tables, and
> test again.

I don't need to do this work because it's already been done by Tiny C.

Using a monolithic, working test program of 22K lines (25K lines
including standard headers for tcc; 27K using gcc) Tiny C compiled it in
no more than 0.07 seconds (at least 360K lps).

gcc took from 2.2 seconds (unoptimised), to over 8 seconds (optimised).
3-12K lps. (Probably it strained the global optimiser having a large
single module as usually the difference between -O0/-O3 is smaller .)

Tiny C must be parsing the same C grammar and expanding the same macros
as gcc. So whatever it takes to do that, can presumably be done at some
speed faster than 350Klps.
0
BartC
10/1/2016 9:17:10 PM
On 30/09/2016 12:28, BartC wrote:
> On 29/09/2016 14:03, BartC wrote:
>
>> [Tokenizing 10M lines/sec is pretty impressive.  In compilers that don't
>> do heavy optimization
>> the lexer is usually the slowest part since it's the only thing that has
>> to touch each
>> character of the source code individually. -John]
>
> This is what I was putting to the test. Actually I struggled to recreate
> that benchmark, but in the end managed to process actual C source (a
> monolithic file containing all CPython sources) at some 9.7Mlps. Figures
> do depend on the source style.

I've put a version of that test program here as a C file:

https://github.com/bartg/langs/blob/master/clex.c

When I ran this on an original Raspberry Pi that struggled to do
anything fast (gcc ran at 500 lines per second, unoptimised), it managed
1.3M lines per second.


--
Bartc

0
BartC
10/17/2016 9:49:40 PM
Reply: