I think there might be a problem with code for indirect calls:
int *pcptr; /* global */ /* Details from much larger code */
stopped=0;
do {
((void(*)(void))*pcptr)();
} while (!stopped);
pcptr usually (99.9%) points to a location containing the address of one of
two functions, both identical to this:
void pushm {
pcptr += 2;
}
Lccwin generates something like this for the indirect call (in NASM syntax):
mov eax,[pcptr]
call [eax]
But gcc generates longer code something like:
mov eax,[pcptr]
mov eax,[eax]
call eax
The problem is, the lccwin code take about 3 times longer! For 50m
iterations on my slow machine, about 2400ms for lccwin and about 00ms for
gcc.
I've tried putting in the longer code as inline asm() instructions, and on a
real test where lccwin32 had been 60% slower than gcc, with this new call,
it was about the same speed as gcc!
--
Bart
|
|
0
|
|
|
|
Reply
|
bc (2211)
|
4/25/2008 1:00:44 AM |
|
Bartc wrote:
> I think there might be a problem with code for indirect calls:
>
> int *pcptr; /* global */ /* Details from much larger code */
> stopped=0;
>
> do {
> ((void(*)(void))*pcptr)();
> } while (!stopped);
>
> pcptr usually (99.9%) points to a location containing the address of one of
> two functions, both identical to this:
>
> void pushm {
> pcptr += 2;
> }
>
> Lccwin generates something like this for the indirect call (in NASM syntax):
>
> mov eax,[pcptr]
> call [eax]
>
> But gcc generates longer code something like:
>
> mov eax,[pcptr]
> mov eax,[eax]
> call eax
>
> The problem is, the lccwin code take about 3 times longer! For 50m
> iterations on my slow machine, about 2400ms for lccwin and about 00ms for
> gcc.
>
> I've tried putting in the longer code as inline asm() instructions, and on a
> real test where lccwin32 had been 60% slower than gcc, with this new call,
> it was about the same speed as gcc!
>
The code is not equivalent
In lccwin I read a function pointer then call that value
In the gcc code shown, you load a pointer value, then dereference that
and use THAT value as the call value
Please send me a compilable snippet and I will look if there is a
problem with it
thanks
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
|
|
0
|
|
|
|
Reply
|
jacob
|
4/25/2008 5:58:08 AM
|
|
"jacob navia" <jacob@nospam.com> wrote in message
news:furrtk$p40$1@aioe.org...
> Bartc wrote:
>> Lccwin generates something like this for the indirect call (in NASM
>> syntax):
>>
>> mov eax,[pcptr]
>> call [eax]
>>
>> But gcc generates longer code something like:
>>
>> mov eax,[pcptr]
>> mov eax,[eax]
>> call eax
>>
>> The problem is, the lccwin code take about 3 times longer! For 50m
> The code is not equivalent
>
> In lccwin I read a function pointer then call that value
>
> In the gcc code shown, you load a pointer value, then dereference that
> and use THAT value as the call value
pcptr is effectively a pointer to a pointer to a function. Look at the code
more carefully: lccwin ends with CALL [EAX]; gcc ends with CALL EAX;
So they both do the same; anyway in the following, commenting out the C code
and inserting the asm made this code fragment much faster when compiled with
lccwin:
do {
//((void(*)(void))*pcptr)();
_asm ("mov _pcptr,%ebx");
_asm ("mov (%ebx),%ebx");
_asm ("call %ebx");
}while (!stopped)
Your code generator produces, with -O,
; 74 ((void(*)(void))*pcptr)();
.line 74
movl _pcptr,%ebx
call *(,%ebx)
On an actual test (not just calling empty functions), this change reduced
the lccwin runtime from 3000ms to 1900ms (gcc was 1700ms). On another test,
reduced lccwin runtime from 10,000ms to 6700ms (gcc was 7100ms).
BUT: I haven't been able to reproduce these differences in a smaller test
program. Just unrolling the loop a little lost any advantage of call reg
over call [reg]. So leave this alone for now; I will just use the asm() as
needed. Although there is clearly something odd going on in the CPU.
--
Bart
|
|
0
|
|
|
|
Reply
|
Bartc
|
4/25/2008 9:48:17 AM
|
|
Bartc wrote:
OK
1: I changed the code generator to emit the code as you want
2: I wrote this program:
typedef void (*fnptr)(void);
fnptr pfnptr;
fnptr *ppfnptr;
void n(void)
{
}
int main(void)
{
int i;
pfnptr=n;
ppfnptr = &pfnptr;
for (i=0; i<100000000; i++)
(*ppfnptr)();
}
Then I compiled using the new code generator. Elapsed time 1.558 seconds
Then I compiled using the old code generator. Elapsed time 1.502 seconds
The difference is not significative
Code generated by the old code generator:
[0000027] 8b1d00000000 mov 0x0,%ebx (_ppfnptr)
[0000033] ff141d00000000 call *0x0(,%ebx,1)
Code generated by the new code generator:
[0000027] 8b1d00000000 mov 0x0,%ebx (_ppfnptr)
[0000033] 8b1b mov (%ebx),%ebx
[0000035] ffd3 call *%ebx
I would like to improve the code generated, but I just do not understand
why you see such a difference.
Which CPU are you using?
Which OS?
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
|
|
0
|
|
|
|
Reply
|
jacob
|
4/25/2008 10:31:47 AM
|
|
"jacob navia" <jacob@nospam.com> wrote in message
news:fusbuo$nbq$1@aioe.org...
> Bartc wrote:
> OK
>
> 1: I changed the code generator to emit the code as you want
> 2: I wrote this program:
Like I said I couldn't reproduce the difference outside the larger program.
> I would like to improve the code generated, but I just do not understand
> why you see such a difference.
>
> Which CPU are you using?
> Which OS?
The OS is WinXP. The CPU with the big timing difference is Pentium M 1.1GHz
(a laptop).
But when I tried it on a Pentium 4 2.93Ghz machine, the differences were
minimal; although the new CALL EBX form seemed to be 5% faster than the old
CALL (EBX) form. And on this machine, on my second test, lccwin was anyway
faster than gcc!
So I wouldn't worry about it too much; I will discover other things I'm sure
pretty soon. That code on the Pentium M must have been a strange combination
of different factors.
Maybe best to keep the new code though, if only because that's what gcc
seems to use.
--
Thanks,
Bart
|
|
0
|
|
|
|
Reply
|
Bartc
|
4/25/2008 11:32:17 AM
|
|
|
4 Replies
95 Views
(page loaded in 0.506 seconds)
Similiar Articles: Report Generator Book - comp.soft-sys.matlab... book on using the Report Generator module in MatLab? I've looked at some of the examples and information, but would like a more detailed ... the data in your MATLAB code or ... Report Generator - comp.lang.awkI need a simple-minded report generator. Long ago, I ... level control-break header and footer, and detail ... A code generator written in AWK - comp.lang.awk Report ... Using instrument control toolbox to download a self-written ...Can you provide more details: 1) What is the version number of the ... pi*f*t)+cos(2*pi*f*t); > and want to download this wave to the generator. I use the following code ... matlab code for speed control of 3 phase induction motor using ...manish wrote: > matlab code for speed control of 3 phase induction ... doing a matlab project on speed control of 3 phase induction generator.plz send the required details ... Matlab Code for Automatic Quadrilateral Mesh generation - comp ...I have search for quadrilateral mesh generator in Matlab ... think i don't have enough time to write the code.. i ... There is only triangle mesh, but idea is simple (details ... ASM to C - comp.lang.asm.x86All of these versions use essentially the same code generator, and for the most part ... See Morgan book for detailed flow graph compilation algorithms: Morgan, R ... Generating PWM signals for testing ? - comp.lang.verilog ...... way to generate PWM signals for testing verilog code ? ... the PwmValue variable to control what this generator does. ... file I/O in C, but there are many differences of detail ... non-repeating, random, textures ... - comp.graphics.api.opengl ...... me if it's possible (preferably with some sample code ... and there is a book that I have heard is quite detailed ... Generator Book - comp.soft-sys.matlab Report Generator ... Can we convert a char to ascii in awk - comp.lang.awkYou will have to work out the details of getting the extension lib compiled ... comp.lang.awk Can we convert a char to ascii in awk - comp.lang.awk A code generator ... Misuses of RTTI - comp.lang.c++.moderatedI only add some variables to my structures and the code generator generates the ... me to know how best to write it, don't get bogged down in implementation detail. Chrome Web Store - QR Code Generator - GoogleCreates a right-click context menu for generating a QR Code from the current page, link or selected text. SerialVersionUID Code Generator - NetBeans Plugin detailIntroduction Generation is enabled in two ways 1. Code template (Check tools->options->Editor->Code templates tab 2. Code Generattor (ALT-Insert option) Please open ... 7/20/2012 2:10:45 PM
|