Code generator detail

  • Follow


I think there might be a problem with code for indirect calls:

int *pcptr; /* global */        /* Details from much larger code */
stopped=0;

do {
        ((void(*)(void))*pcptr)();
} while (!stopped);

pcptr usually (99.9%) points to a location containing the address of one of
two functions, both identical to this:

void pushm {
pcptr += 2;
}

Lccwin generates something like this for the indirect call (in NASM syntax):

mov eax,[pcptr]
call [eax]

But gcc generates longer code something like:

mov eax,[pcptr]
mov eax,[eax]
call eax

The problem is, the lccwin code take about 3 times longer! For 50m
iterations on my slow machine, about 2400ms for lccwin and about 00ms for
gcc.

I've tried putting in the longer code as inline asm() instructions, and on a
real test where lccwin32 had been 60% slower than gcc, with this new call,
it was about the same speed as gcc!

-- 
Bart





0
Reply bc (2211) 4/25/2008 1:00:44 AM

Bartc wrote:
> I think there might be a problem with code for indirect calls:
> 
> int *pcptr; /* global */        /* Details from much larger code */
> stopped=0;
> 
> do {
>         ((void(*)(void))*pcptr)();
> } while (!stopped);
> 
> pcptr usually (99.9%) points to a location containing the address of one of
> two functions, both identical to this:
> 
> void pushm {
> pcptr += 2;
> }
> 
> Lccwin generates something like this for the indirect call (in NASM syntax):
> 
> mov eax,[pcptr]
> call [eax]
> 
> But gcc generates longer code something like:
> 
> mov eax,[pcptr]
> mov eax,[eax]
> call eax
> 
> The problem is, the lccwin code take about 3 times longer! For 50m
> iterations on my slow machine, about 2400ms for lccwin and about 00ms for
> gcc.
> 
> I've tried putting in the longer code as inline asm() instructions, and on a
> real test where lccwin32 had been 60% slower than gcc, with this new call,
> it was about the same speed as gcc!
> 

The code is not equivalent

In lccwin I read a function pointer then call that value

In the gcc code shown, you load a pointer value, then dereference that
and use THAT value as the call value


Please send me a compilable snippet and I will look if there is a
problem with it

thanks


-- 
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
0
Reply jacob 4/25/2008 5:58:08 AM


"jacob navia" <jacob@nospam.com> wrote in message 
news:furrtk$p40$1@aioe.org...
> Bartc wrote:

>> Lccwin generates something like this for the indirect call (in NASM 
>> syntax):
>>
>> mov eax,[pcptr]
>> call [eax]
>>
>> But gcc generates longer code something like:
>>
>> mov eax,[pcptr]
>> mov eax,[eax]
>> call eax
>>
>> The problem is, the lccwin code take about 3 times longer! For 50m

> The code is not equivalent
>
> In lccwin I read a function pointer then call that value
>
> In the gcc code shown, you load a pointer value, then dereference that
> and use THAT value as the call value

pcptr is effectively a pointer to a pointer to a function. Look at the code 
more carefully: lccwin ends with CALL [EAX]; gcc ends with CALL EAX;

So they both do the same; anyway in the following, commenting out the C code 
and inserting the asm made this code fragment much faster when compiled with 
lccwin:

do {
    //((void(*)(void))*pcptr)();
    _asm ("mov _pcptr,%ebx");
    _asm ("mov (%ebx),%ebx");
    _asm ("call %ebx");
}while (!stopped)

Your code generator produces, with -O,

;   74 ((void(*)(void))*pcptr)();
        .line   74
        movl    _pcptr,%ebx
        call    *(,%ebx)

On an actual test (not just calling empty functions), this change reduced 
the lccwin runtime from 3000ms to 1900ms (gcc was 1700ms). On another test, 
reduced lccwin runtime from 10,000ms to 6700ms (gcc was 7100ms).

BUT: I haven't been able to reproduce these differences in a smaller test 
program. Just unrolling the loop a little lost any advantage of call reg 
over call [reg]. So leave this alone for now; I will just use the asm() as 
needed. Although there is clearly something odd going on in the CPU.

-- 
Bart 


0
Reply Bartc 4/25/2008 9:48:17 AM

Bartc wrote:
OK

1: I changed the code generator to emit the code as you want
2: I wrote this program:

typedef void (*fnptr)(void);
fnptr pfnptr;
fnptr *ppfnptr;
void n(void)
{
}

int main(void)
{
	int i;

	pfnptr=n;
	ppfnptr = &pfnptr;
	for (i=0; i<100000000; i++)
		(*ppfnptr)();
}

Then I compiled using the new code generator. Elapsed time 1.558 seconds
Then I compiled using the old code generator. Elapsed time 1.502 seconds

The difference is not significative

Code generated by the old code generator:
[0000027] 8b1d00000000     mov       0x0,%ebx   (_ppfnptr)
[0000033] ff141d00000000   call      *0x0(,%ebx,1)

Code generated by the new code generator:
[0000027] 8b1d00000000     mov       0x0,%ebx   (_ppfnptr)
[0000033] 8b1b             mov       (%ebx),%ebx
[0000035] ffd3             call      *%ebx

I would like to improve the code generated, but I just do not understand
why you see such a difference.

Which CPU are you using?
Which OS?


-- 
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
0
Reply jacob 4/25/2008 10:31:47 AM

"jacob navia" <jacob@nospam.com> wrote in message 
news:fusbuo$nbq$1@aioe.org...
> Bartc wrote:
> OK
>
> 1: I changed the code generator to emit the code as you want
> 2: I wrote this program:

Like I said I couldn't reproduce the difference outside the larger program.

> I would like to improve the code generated, but I just do not understand
> why you see such a difference.
>
> Which CPU are you using?
> Which OS?

The OS is WinXP. The CPU with the big timing difference is Pentium M 1.1GHz 
(a laptop).

But when I tried it on a Pentium 4 2.93Ghz machine, the differences were 
minimal; although the new CALL EBX form seemed to be 5% faster than the old 
CALL (EBX) form. And on this machine, on my second test, lccwin was anyway 
faster than gcc!

So I wouldn't worry about it too much; I will discover other things I'm sure 
pretty soon. That code on the Pentium M must have been a strange combination 
of different factors.

Maybe best to keep the new code though, if only because that's what gcc 
seems to use.

-- 
Thanks,

Bart



0
Reply Bartc 4/25/2008 11:32:17 AM

4 Replies
95 Views

(page loaded in 0.506 seconds)

Similiar Articles:













7/20/2012 2:10:45 PM


Reply: