Hi all,
While doing some asm coding recently I noticed that
when I called _malloc, my ECX register was getting trashed.
I did some digging and was surprised to learn that this is
normal, that callers to C library routines are expected to
save EAX, ECX, and EDX.
Seems to me that if one is doing a lot of such calls,
it is best to not use ECX and EDX for local variables at all,
lest one has to push/pop over and over which is more expensive
than storing data on the stack.
My question is, what is the historical basis for this
practice? It seems ironic that given a CPU architecture like
x86 that has so few registers, a practice like this
is in place that discourages use of registers.
Thanks.
|
|
0
|
|
|
|
Reply
|
questioner_x3 (5)
|
12/30/2009 4:00:00 PM |
|
Qyz <questioner_x@munged.microcosmotalk.com> wrote in part:
> While doing some asm coding recently I noticed that when I called
> _malloc, my ECX register was getting trashed. I did some digging
> and was surprised to learn that this is normal, that callers to
> C library routines are expected to save EAX, ECX, and EDX.
You ask a good, old question for which there is no solidly
settled answer. Caller- vs callee-save is contentious with
many side-effects, the biggest being function return flexibility.
As an ASMer, I chafe at restrictions.
The caller knows what regs need saving (maybe none),
the callee knows which regs it wants to use (at least one).
So I tend towards caller-saves, and call "big" fns.
I suspect EAX is saved in C because the condition code has to be
returned there. EDX for some "extended" return. ECX may then
be for some "succeeded" length of read/write/malloc/... .
> Seems to me that if one is doing a lot of such calls, it is best
> to not use ECX and EDX for local variables at all, lest one has
> to push/pop over and over which is more expensive than storing
> data on the stack.
Not much difference -- use SUB ESP,4 / MOV [ESP],EAX if you
_really_ think PUSH EAX is too slow. Better for multiple regs.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
12/30/2009 8:04:26 PM
|
|
"Qyz" <questioner_x@MUNGED.microcosmotalk.com> wrote in message
news:4b3b7900$0$5107$9a6e19ea@unlimited.newshosting.com...
>
> Hi all,
>
> While doing some asm coding recently I noticed that
> when I called _malloc, my ECX register was getting trashed.
> I did some digging and was surprised to learn that this is
> normal, that callers to C library routines are expected to
> save EAX, ECX, and EDX.
>
> Seems to me that if one is doing a lot of such calls,
> it is best to not use ECX and EDX for local variables at all,
> lest one has to push/pop over and over which is more expensive
> than storing data on the stack.
>
> My question is, what is the historical basis for this
> practice? It seems ironic that given a CPU architecture like
> x86 that has so few registers, a practice like this
> is in place that discourages use of registers.
>
typically, one does not use these registers for variables, rather, they are
used for holding temporary variables used during calculations.
as for historical basis:
it can be noted that a good number of historical instructions are hard-coded
to use only certain registers, and EAX, ECX, and EDX were used by a lot of
commonly used instructions. hence, it was almost invariable that people
would have to be shuffling these ones around a lot.
OTOH: BX, BP, SI, and DI were traditionally the only registers which could
be traditionally used for memory addressing, and hence were "valuable" in
this regard. BP especially so, since it was typically used closely with SP
for purposes of accessing the stack (since SP could not be used for memory
addressing apart from push and pop...).
so:
AX: trashed by lots of instructions;
CX: not used by so many instructions, but a few common ones (such as
'loop');
DX: also trashed by lots of instructions;
BX: usable as a memory base, not much used otherwise;
SP: used for stack, only real valid use;
BP: traditionally needed to access stack variables;
SI: usable as memory base or index, used by a few instructions (movsb,
movsw, ...);
DI: usable as memory base or index, used by a few instructions.
given only BX, SI, and DI were usable as bases, and only SI and DI were
usable as indices, this led to each being "valuable" in this regard.
BX was then typically used as a base for, for example, arrays and structs,
and SI or DI could be used as an array index.
meanwhile, AX, CX, and DX were really needed as scratch pad registers, and
it was almost impossible to do much without trashing them anyways (making
them not particularly useful for holding variables or similar).
mov ax, [bx+si]
....
so, it is not hard to see where most these usage conventions come from.
with the 386 and 32-bit code, many of these restrictions were lifted:
one could use pretty much any register they wanted as an index or base
(apart from ESP and EBP, which are still special), and the instruction set
was generally far more othogonal by this point (most instructions would
accept any register one wanted to give them, and many of the fixed-register
legacy instructions fell into disuse, ...).
this was due in large part to a notable redesign of the ModRM byte, and the
addition of a SIB byte, to facilitate a notably different approach to
accessing memory (AKA: "generic" memory addressing, vs fixed-form register
magic...).
however, the 32-bit x86 cdecl convention was a fairly straightforwards
adaptation of the 16-bit one (unlike, say, 32 -> 64 bit, where the various
calling convention designers saw fit to damn near completely redesign them,
leading to the current mess...).
or such...
> Thanks.
|
|
0
|
|
|
|
Reply
|
BGB
|
12/30/2009 9:36:28 PM
|
|
"Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
news:4b3bb24a$0$4848$9a6e19ea@unlimited.newshosting.com...
>
> Qyz <questioner_x@munged.microcosmotalk.com> wrote in part:
>> While doing some asm coding recently I noticed that when I called
>> _malloc, my ECX register was getting trashed. I did some digging
>> and was surprised to learn that this is normal, that callers to
>> C library routines are expected to save EAX, ECX, and EDX.
>
>
> You ask a good, old question for which there is no solidly
> settled answer. Caller- vs callee-save is contentious with
> many side-effects, the biggest being function return flexibility.
> As an ASMer, I chafe at restrictions.
>
> The caller knows what regs need saving (maybe none),
> the callee knows which regs it wants to use (at least one).
> So I tend towards caller-saves, and call "big" fns.
>
> I suspect EAX is saved in C because the condition code has to be
> returned there. EDX for some "extended" return. ECX may then
> be for some "succeeded" length of read/write/malloc/... .
>
my answer had been based partly on how things were in 8086/80286 days, where
I suspect the calling convention was a fairly direct outgrowth of the CPU's
behavior and memory-addressing abilities...
after all, there was no [eax+ecx*4].
you had [bx+si] and you had to like it...
>> Seems to me that if one is doing a lot of such calls, it is best
>> to not use ECX and EDX for local variables at all, lest one has
>> to push/pop over and over which is more expensive than storing
>> data on the stack.
>
> Not much difference -- use SUB ESP,4 / MOV [ESP],EAX if you
> _really_ think PUSH EAX is too slow. Better for multiple regs.
>
or, you can reserve enough space for all your locals, and treat this
reserved stack space as a fixed-size region for locals and as a scratch pad
for passing arguments.
this is, if fact, what many compilers do.
foo:
push ebp
mov ebp, esp
sub esp, 56
....
mov esp, ebp
pop ebp
ret
something vaguely similar to this is "cannonical" in the Win64 calling
convention (although callee-save registers are generally saved before
setting up the base pointer, and restored afterwards).
foo:
push rsi
push rdi
push r12
....
push rbp
lea rbp, [rsp-16]
;the reason for the offset I don't really understand, but oh well, just it
needs to be present,
;and a multiple of 8, from what I remember...
....
lea rsp, [rbp+16]
pop rbp
....
pop r12
pop rdi
pop rsi
ret
in my case, I made a slight adaptation I have used with 32-bit cdecl as
well, mostly because it allows a similar kind of "restoring variables on
unwind" trick. it is also used via a exception handling mechanism I had
specified, but as of yet, have not made much use of (32-bit code still using
hacked-over Win32 SEH...).
or such...
>
> -- Robert
>
|
|
0
|
|
|
|
Reply
|
BGB
|
12/30/2009 9:37:19 PM
|
|
Qyz <questioner_x@MUNGED.microcosmotalk.com> wrote:
>
>Seems to me that if one is doing a lot of such calls,
>it is best to not use ECX and EDX for local variables at all,
>lest one has to push/pop over and over which is more expensive
>than storing data on the stack.
Because the stack top is almost always in the cache, pushing and popping
registers only takes 1 cycle. That wasn't true in the past, of course.
>My question is, what is the historical basis for this
>practice? It seems ironic that given a CPU architecture like
>x86 that has so few registers, a practice like this
>is in place that discourages use of registers.
BGB really has the right answer here. EAX/ECX/EDX are commonly used for
intermediate values in expressions that will generally be completed before
a function gets called. EBX/EDI/ESI tend to be used for address pointers,
which tend to have longer lives.
Further, there's another reason. In the 16-bit days, far pointers required
an instruction that loaded both segment and offset, like:
les di, [xxxx]
Loading a segment register was a VERY expensive operation. It was
preferable to let the caller assume that DI survived, and if the callee
needed another pointer, it could pay the price.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
12/31/2009 5:49:35 AM
|
|
On 30 Dec 2009 21:36:28 GMT, "BGB / cr88192"
<cr88192@MUNGED.microcosmotalk.com> wrote:
<snip>
>with the 386 and 32-bit code, many of these restrictions were lifted:
>one could use pretty much any register they wanted as an index or base
>(apart from ESP and EBP, which are still special), and the instruction set
>was generally far more othogonal by this point (most instructions would
>accept any register one wanted to give them, and many of the fixed-register
>legacy instructions fell into disuse, ...).
As an aside, note that in Windows code all
segments use the same base address, so EBP can be
used as base or index just like any other
register.
Best regards,
Bob Masta
DAQARTA v5.00
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
Frequency Counter, FREE Signal Generator
Pitch Track, Pitch-to-MIDI
DaqMusic - FREE MUSIC, Forever!
(Some assembly required)
Science (and fun!) with your sound card!
|
|
0
|
|
|
|
Reply
|
N0Spam
|
12/31/2009 2:56:29 PM
|
|
"Qyz" <questioner_x@MUNGED.microcosmotalk.com> wrote in message
news:4b3b7900$0$5107$9a6e19ea@unlimited.newshosting.com...
>
> My question is, what is the historical basis for
> [EAX, ECX, EDX not preserved for C functions]?
Probably MS, if not, then perhaps the Intel ABI, if not, then ... Well, you
may need to locate the answer to that yourself. It seems these are the
standard registers for most of the x86 calling conventions.
Larry Osterman and Raymond Chen, both of MS I think... , write blogs with
much historical programming info. Those may be a good start. Some of their
links are below. It's been a while since I've read them. So,
unfortunately, you'll have to search them to see if your answer is there.
> It seems ironic that given a CPU architecture like
> x86 that has so few registers,
Ah, the RISC propagated myth: "too few registers", that just never dies...
Liberal CS prof's must be teaching this historically baseless crud in
school.
Large register sets are required with RISC instruction sets because the RISC
instructions are weak. By "weak", I mean that they do very little work per
cpu clock. (I'm using "cpu" instead of the technically correct term:
micro-processor, for brevity.) The "theory" behind RISC was that CISC cpu's
wasted time. RISC theorists claimed that there was much unused time hidden
in CISC's slower, more complicated, and more powerful instruction sequences.
So, they theorized that if instructions could be made simpler, shorter,
faster, such a cpu wouldn't waste time. I.e., it could do more work.
Unfortunately, designing cpu's to be RISC had far more negative consequences
than positive. The simple instructions required that RISC cpu's needed to
maintain a larger set of active working data per computation, i.e.,
typically more cpu registers. Since the simple RISC instructions don't do
as much work per clock on average as CISC does, RISC cpu's typically needed
far more instructions to complete the same computation. This requires
higher memory bandwidth and more memory for data storage. More memory and
faster memory, at one point in time, meant you were going to pay out much,
much more money. This higher memory cost of a RISC cpu killed the market
RISC cpu's in the 1990's. Today, memory costs are much lower, memory sizes
are far greater, and RISC cpu's use less silicon (i.e., cheap). So, a
technically obsolete RISC design, like the ARM - which was virtually dead -
has recovered and is now used heavily for embedded or mobile devices.
__links__
x86 calling conventions (Wikipedia)
http://en.wikipedia.org/wiki/Stdcall
callee registers, safe C functions, etc. (Chris Giese)
http://redir.no-ip.org/mirrors/my.execpc.com/~geezer/osd/libc/index.htm
cdecl - caller must clean up stack (Larry Osterman)
http://blogs.msdn.com/larryosterman/archive/2004/07/20/188906.aspx#189231
frame pointer omission (esp-ebp vs. esp) (Larry Osterman)
http://blogs.msdn.com/larryosterman/archive/2007/03/12/fpo.aspx
history of calling conventions (Raymond Chen)
http://blogs.msdn.com/oldnewthing/archive/2004/01/02/47184.aspx
http://blogs.msdn.com/oldnewthing/archive/2004/01/07/48303.aspx
http://blogs.msdn.com/oldnewthing/archive/2004/01/08/48616.aspx
http://blogs.msdn.com/oldnewthing/archive/2004/01/13/58199.aspx
http://blogs.msdn.com/oldnewthing/archive/2004/01/14/58579.aspx
why does the x86 have so few registers (Raymond Chen)
http://blogs.msdn.com/oldnewthing/archive/2004/01/05/47685.aspx
HTH,
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
12/31/2009 2:56:55 PM
|
|
On 31 Dec 2009 05:49:35 GMT, Tim Roberts
<timr@MUNGED.microcosmotalk.com> wrote:
<snip>
>Further, there's another reason. In the 16-bit days, far pointers required
>an instruction that loaded both segment and offset, like:
> les di, [xxxx]
>
>Loading a segment register was a VERY expensive operation. It was
>preferable to let the caller assume that DI survived, and if the callee
>needed another pointer, it could pay the price.
It's not all THAT bad. I looked up some timings:
mov ds,ax on a 386 is only 2 clocks in real mode, 18 in PM.
mov ds,ax on a 486 is only 3 clocks in real mode, 9 in PM.
mov ds,mem on a 386 is only 5 clocks in real mode, 19 in PM.
mov ds,mem on a 486 is only 3 clocks in real mode, 9 in PM.
lds reg,mem on a 386 is only 7 clocks in real mode, 22 in PM.
lds reg,mem on a 486 is only 6 clocks in real mode, 12 in PM.
I am sure the numbers got worse on later processors, but I have no
details available to me on that subject.
--
ArarghMail912 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html
To reply by email, remove the extra stuff from the reply address.
|
|
0
|
|
|
|
Reply
|
ArarghMail912NOSPAM
|
12/31/2009 2:57:10 PM
|
|
"Qyz" wrote:
> Hi all,
>
> While doing some asm coding recently I noticed that
> when I called _malloc, my ECX register was getting trashed.
> I did some digging and was surprised to learn that this is
> normal, that callers to C library routines are expected to
> save EAX, ECX, and EDX.
> Seems to me that if one is doing a lot of such calls,
> it is best to not use ECX and EDX for local variables at all,
> lest one has to push/pop over and over which is more expensive
> than storing data on the stack.
> My question is, what is the historical basis for this
> practice?
perhaps it came from the original naming of registers
and their dedicated use in most complex instructions:
0 Accumulator
1 Count
2 Data
3 Base
4 Stack Pointer
5 Block Pointer
6 Source Index
7 Destination Index
> It seems ironic that given a CPU architecture like
> x86 that has so few registers, a practice like this
> is in place that discourages use of registers.
I don't use any C-stuff nor the so commen calling convention,
so I got seven registers free for anything desired, but also
then register are used mainly as instructions are designed.
btw: pushad/popad timing isn't too bad (~ like three push/pop-reg).
__
wolfgang
|
|
0
|
|
|
|
Reply
|
wolfgang
|
12/31/2009 2:57:22 PM
|
|
On 31 Dec, 14:57, ArarghMail912NOS...@NOT.AT.Arargh.com wrote:
....
> >Loading a segment register was a VERY expensive operation. =A0It was
> >preferable to let the caller assume that DI survived, and if the callee
> >needed another pointer, it could pay the price.
>
> It's not all THAT bad. =A0I looked up some timings:
>
> mov ds,ax on a 386 is only 2 clocks in real mode, 18 in PM.
> mov ds,ax on a 486 is only 3 clocks in real mode, 9 in PM.
>
> mov ds,mem on a 386 is only 5 clocks in real mode, 19 in PM.
> mov ds,mem on a 486 is only 3 clocks in real mode, 9 in PM.
>
> lds reg,mem on a 386 is only 7 clocks in real mode, 22 in PM.
> lds reg,mem on a 486 is only 6 clocks in real mode, 12 in PM.
>
> I am sure the numbers got worse on later processors, but I have no
> details available to me on that subject.
For later CPUs individual instruction timings depend more on the
instruction mix so are not directly comparable but we should be able
to get an idea of expense. Here are some figures I made a while ago.
(For the record the timings were made by running variable numbers of
segment loads one after another without any other types of
instructions mixed in. The segment load was executed from zero to ten
times - i.e. eleven times altogether - and the cycle counts recorded.
Then the ten differences were used to give a cycle count for each
extra segment load.)
The figures are for only two different CPUs but one was AMD and the
other Intel. The AMD CPU was an Athlon 64 X2 Dual Core (Socket AM2).
The Intel CPU was a Pentium M 90nm.
To load a segment register in protected mode I had timings of
AMD: 8 cycles
Intel: varied 9 to 15 but most were 12 cycles
Opinions? I thought the timings were not too bad. Certainly they don't
accord with the scare stories that seem to circulate. But they are an
expense. The times would probably be reduced further if they could be
overlapped with other instructions - i.e. loaded ahead of time.
The flip side of using segment registers is using them as overrides so
I tried similar timings for adding a segment override to a memory
access instruction. In the tests I ran the results were that using a
segment override cost nothing at all. On these two CPUs it seems that
using segment overrides are free. The only time taken is in the
segment register loads.
James
|
|
0
|
|
|
|
Reply
|
James
|
12/31/2009 7:38:46 PM
|
|
On 31 Dec, 14:56, "Rod Pemberton" <do_not_h...@nohavenot.cmm> wrote:
....
> > It seems ironic that given a CPU architecture like
> > x86 that has so few registers,
>
> Ah, the RISC propagated myth: "too few registers", that just never dies..=
..
> Liberal CS prof's must be teaching this historically baseless crud in
> school.
Good point. It is a myth that RISC is always best. Like most purist
arguments it has its good points if not taken to extreme. For example,
a pure RISC design might cut down the transistor count which may be an
overriding factor but it might mean there's no multiply instruction,
or that you have to write thirty two multiply instructions one after
another to multiply two 32-bit numbers.... Not good.
In many cases balance is more appropriate. Then, something which is
overemphasised results in a lack of balance.
However, having said that I must say that I think eight registers is
just a little bit on the small side....
> Large register sets are required with RISC instruction sets
When I see a machine advertising a large register file I think of the
slow context switch times.
Also, large register files require more bits in instruction encodings.
> because the RISC
> instructions are weak. =A0By "weak", I mean that they do very little work=
per
> cpu clock.
This is normally a good thing, though. I think the problem is CISC
instructions which do two or more *different* things in one
instruction. For example, multiply is good but an instruction which
does decrement, shift and branch is bad.
James
|
|
0
|
|
|
|
Reply
|
James
|
12/31/2009 11:08:17 PM
|
|
"Bob Masta" <N0Spam@daqarta.com> wrote in message
news:4b3cbb9c$0$5089$9a6e19ea@unlimited.newshosting.com...
>
> On 30 Dec 2009 21:36:28 GMT, "BGB / cr88192"
> <cr88192@MUNGED.microcosmotalk.com> wrote:
>
> <snip>
>
>>with the 386 and 32-bit code, many of these restrictions were lifted:
>>one could use pretty much any register they wanted as an index or base
>>(apart from ESP and EBP, which are still special), and the instruction set
>>was generally far more othogonal by this point (most instructions would
>>accept any register one wanted to give them, and many of the
>>fixed-register
>>legacy instructions fell into disuse, ...).
>
> As an aside, note that in Windows code all
> segments use the same base address, so EBP can be
> used as base or index just like any other
> register.
>
granted, however, it is worth noting that ESP and EBP are special in the x86
ModRM and SIB coding.
this results in special edge cases when using them:
ESP can't be used as an index (this extrnds AFAIK also to R12);
EBP, if used as a base register, needs an offset (actually, there were 2
redundant encodings here, and on x86-64, one was reused for RIP-relative
addressing);
ESP also involves special behavior (AKA: needing a SIB byte).
....
for example:
mov eax, [ebp]
is actually assembled as:
mov eax, [ebp+0]
and:
mov eax, [ebp+esp*4]
is non-encodable...
granted, I noted that my wording was ambiguous, and made it seem as if ESP
and EBP couldn't be used as base registers. that is not what I meant...
after all, the whole original point of BP/EBP was to be a base pointer.
granted, it is only with 32-bit x86 that ESP could be used as a base
pointer.
....
> Best regards,
>
>
> Bob Masta
>
> DAQARTA v5.00
> Data AcQuisition And Real-Time Analysis
> www.daqarta.com
> Scope, Spectrum, Spectrogram, Sound Level Meter
> Frequency Counter, FREE Signal Generator
> Pitch Track, Pitch-to-MIDI
> DaqMusic - FREE MUSIC, Forever!
> (Some assembly required)
> Science (and fun!) with your sound card!
|
|
0
|
|
|
|
Reply
|
BGB
|
12/31/2009 11:09:02 PM
|
|
"Rod Pemberton" <do_not_have@nohavenot.cmm> wrote:
>
>Large register sets are required with RISC instruction sets because the RISC
>instructions are weak. By "weak", I mean that they do very little work per
>cpu clock.
That's not accurate. You can say "they do less work per instruction"
compared to CISC, but not "per CPU clock". The typical RISC set runs one
instruction per clock, and it's difficult to do much better than that.
Most x86 compilers use the CPUs as if they were RISC anyway, by eschewing
the more complicated instructions. Look at the CPU code generated by an
x86 compiler for a moderately interesting function, and compare that to the
CPU code generated by an ARM compiler. The number of instructions is not
significantly different.
>Unfortunately, designing cpu's to be RISC had far more negative consequences
>than positive.
That's just an inaccurate overgeneralization. Academically, this battle
has not been won. It has been won commercially largely because IBM chose
the x86 for the PC, not because of technical superiority.
>So, a technically obsolete RISC design, like the ARM - which was virtually
>dead - has recovered and is now used heavily for embedded or mobile devices.
You cannot argue that RISC "technically obsolete". If you want to argue
"commercially obsolete", then you could make an excellent argument, but on
a technical basis, the RISC vs CISC debate is far from over.
And I would disagree that the ARM was ever "virtually dead". The ARM
architecture has been highly successful throughout its entire life. It has
had a long and productive career in the embedded markets all along.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
1/4/2010 2:33:44 AM
|
|
"wolfgang kern" <nowhere@never.at> wrote:
>
>"Qyz" wrote:
>
>> My question is, what is the historical basis for this
>> practice?
>
>perhaps it came from the original naming of registers
>and their dedicated use in most complex instructions:
>
>0 Accumulator
>1 Count
>2 Data
>3 Base
>4 Stack Pointer
>5 Block Pointer
>6 Source Index
>7 Destination Index
Although I agree with your naming, this seems like a non sequitur. Why do
you think this supports the choice of volatile registers in the calling
convention?
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
1/4/2010 2:34:22 AM
|
|
Tim Roberts wrote:
> "wolfgang kern" <nowhere@never.at> wrote:
>> "Qyz" wrote:
>>
>>> My question is, what is the historical basis for this
>>> practice?
>> perhaps it came from the original naming of registers
>> and their dedicated use in most complex instructions:
>>
>> 0 Accumulator
>> 1 Count
>> 2 Data
>> 3 Base
>> 4 Stack Pointer
>> 5 Block Pointer
>> 6 Source Index
>> 7 Destination Index
>
> Although I agree with your naming, this seems like a non sequitur. Why do
> you think this supports the choice of volatile registers in the calling
> convention?
The fact that the "callee preserves" registers are the same ones that
can be used in an effective address (16-bit) may be relevant... or not...
Best,
Frank
|
|
0
|
|
|
|
Reply
|
Frank
|
1/4/2010 11:17:12 AM
|
|
"Tim Roberts" asked:
>>"Qyz" wrote:
>>> My question is, what is the historical basis for this
>>> practice?
>>perhaps it came from the original naming of registers
>>and their dedicated use in most complex instructions:
>>0 Accumulator
>>1 Count
>>2 Data
>>3 Base
>>4 Stack Pointer
>>5 Block Pointer
>>6 Source Index
>>7 Destination Index
> Although I agree with your naming, this seems like a non sequitur. Why do
> you think this supports the choice of volatile registers in the calling
> convention?
I just looked at the CPU-internal Register-numbers, and as the first three
might also be the most used/altered/instruction-inherent since 8000, this
could be the reason for not saving their contents in Lib-functions of HLLs.
__
wolfgang
|
|
0
|
|
|
|
Reply
|
wolfgang
|
1/4/2010 11:17:21 AM
|
|
"James Harris" <james.harris.1@MUNGED.microcosmotalk.com> wrote in message
news:4b3d2ee1$0$5104$9a6e19ea@unlimited.newshosting.com...
>
> I think the problem is CISC
> instructions which do two or more *different* things in one
> instruction. For example, multiply is good but an instruction which
> does decrement, shift and branch is bad.
>
Bad, why?
By use of the word "This", below, I mean combination unrelated operations
into a single operation.
1) This is a key technique they use to speed up interpreters. It's called
"super instructions" (Anton Ertl) or "superoperators" (Todd Proebstring).
2) This is also a key technique used in recent x86 processors. It's called
"micro-ops fusion" and "macro-fusion". "micro-ops fusion" merges micro-ops
into other micro-ops. "macro-fusion" merges entire x86 instructions
together.
The ability to combine small pieces of functionality into larger pieces can
be quite useful: procedures, grouping, register transfer languages, etc.
If the "break it down, build it back up" paradigm could be made to work well
for assembly, one could - in theory - break all assembly up into a single
instruction, i.e., SUBLEQ, optimize, and rebuild into CISC, RISC, VLIW, etc.
The FORTH and BrainFuck languages are also interesting here. BrainFuck for
it's simplicity and FORTH for it's complexity ontop of primitives. But,
this is more attune to comp.lang.misc or alt.os.development...
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
1/4/2010 11:17:32 AM
|
|
"Tim Roberts" <timr@MUNGED.microcosmotalk.com> wrote in message
news:4b415388$0$4966$9a6e19ea@unlimited.newshosting.com...
> "Rod Pemberton" <do_not_have@nohavenot.cmm> wrote:
> >
> >Large register sets are required with RISC instruction sets because the
RISC
> >instructions are weak. By "weak", I mean that they do very little work
per
> >cpu clock.
>
> That's not accurate.
It is accurate.
> You can say "they do less work per instruction"
> compared to CISC,
True too, but that wasn't my point.
> but not "per CPU clock".
"per CPU clock" too was my point.
> The typical RISC set runs one
> instruction per clock, and it's difficult to do much better than that.
If one averages CISC work per instruction to a per clock basis so that they
can be compared with RISC, CISC instructions do more work per clock than
RISC. That was part of the reason why RISC failed. The conjecture that
time was wasted by CISC in large cycle instructions didn't bear out as truth
from RISC designs. If they had, RISC designs would've been "screamers"
compared to CISC. Most of the early RISC designs underperformed CISC
designs, even though RISC with a single clock cycle per instruction
supposedly had an advantage according to RISC theorists. Given that CISC
designs at one time had "idle" time within instructions, how is this
possible? It's possible because they also executed many of each
instruction's micro-operations within a fraction of a clock.
> Most x86 compilers use the CPUs as if they were RISC anyway, by eschewing
> the more complicated instructions.
Most or just GCC? Even very old versions of x86 compilers, like Watcom,
generate superior code to current versions of GCC. So, what do you mean by
"most"?
> ... by eschewing
> the more complicated instructions.
There are well documented reasons, i.e., design changes, to avoid the more
complicated instructions on post 386 x86.
> Look at the CPU code generated by an
> x86 compiler for a moderately interesting function, and compare that to
the
> CPU code generated by an ARM compiler. The number of instructions is not
> significantly different.
I don't have access to an ARM platform.
> >Unfortunately, designing cpu's to be RISC had far more negative
consequences
> >than positive.
>
> That's just an inaccurate overgeneralization.
What happened in the market place is "just an inaccurate
overgeneralization"?
Years ago, there were a number ARM based PC computers ("Acorn Risc Machine")
and motherboards available to the consumer. They weren't bought. When PC
based DEC Alpha motherboards came out, those weren't bought either.
> Academically, this battle
> has not been won.
ARM's are designed for low power consumption and low cost. Due to their low
cost, ARM's might have a place in netbooks. If the link below is up to
date, then their computational performance is more than an order of
magnitude less than current x86's. Rough guessing from the chart is that
ARM's best cpu in the chart is roughly equivalent to 600Mhz x86. A 500Mhz
x86 is barely adequate for PC applications, web browsing, online video, but
not for gaming, etc.
http://en.wikipedia.org/wiki/Million_instructions_per_second#Million_instructions_per_second
> It has been won commercially largely because IBM chose
> the x86 for the PC, not because of technical superiority.
Really?...
So, you're claiming that *none* of the other cpu's that competed against
8086 (prior to being called CISC) in the PC wars (Commodore's, Amiga's,
Apple's, McIntoshes, Timex Sinclair's, etc.) were technically superior to
8086 in the exact same ways you're claiming ARM is now technically superior?
6502? You do realize x86 was 1978 and ARM was 1985, yes? (80386 was 1985)
What about these RISC processors: PA-RISC? DEC Alpha? SPARC? Except for
SPARC, all of those have failed too. Z8000? 68000? Less RISC, more CISC,
but they died too. Clearly, this implies x86 can't be defeated by RISC or
CISC. That's untrue, right? Are there no RISC or CISC designs technically
superior to x86? x86 won *solely* because IBM survived? Natural monopoly?
If you look at graphs of CISC and RISC complexity vs. cpu instruction size,
there is a consistent trend over time from RISC designs towards CISC - even
within specific microprocessor lines. This strongly implies that RISC
processors have some disadvantage, be it market or technical. IMO, it's
both.
> You cannot argue that RISC "technically obsolete". If you want to argue
> "commercially obsolete", then you could make an excellent argument, but on
> a technical basis, the RISC vs CISC debate is far from over.
I can also argue it's "technically obsolete". Even Intel and HP came to
that conclusion. They concluded that VLIW will kill both...
> And I would disagree that the ARM was ever "virtually dead". The ARM
> architecture has been highly successful throughout its entire life. It
has
> had a long and productive career in the embedded markets all along.
That's not true, AFAIR. They were first put into PC's: Acorn Risc Machine
(?). IIRC, they nearly went bankrupt which forced them to sell off the
processor design unit.
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
1/4/2010 11:17:55 AM
|
|
On 4 Jan, 02:33, Tim Roberts <t...@MUNGED.microcosmotalk.com> wrote:
> You cannot argue that RISC "technically obsolete". =A0If you want to argu=
e
> "commercially obsolete", then you could make an excellent argument, but o=
n
> a technical basis, the RISC vs CISC debate is far from over.
I'm not even sure you can say "commercially obsolete". The ARM is the
highest-volume 32-bit (or more) processor in the world by a very large
margin, with approximately 1,500,000,000 per year being shipped
currently (yes, that's one-and-a-half billion!), mostly going into
cellphones.
Richard.
http://www.rtrussell.co.uk/
To reply by email change 'news' to my forename.
|
|
0
|
|
|
|
Reply
|
Richard
|
1/4/2010 11:18:38 AM
|
|
|
18 Replies
177 Views
(page loaded in 0.101 seconds)
Similiar Articles: Using C and Assembly code: 64Bit Calling convention - comp.lang ...Now, I don't know exactly about calling conventions of the AMD 64Bit model. ... It is counterproductive to confuse him with such weird ideas. Do you argee ... C style question - comp.lang.cIt is not uncommon for system headers to use different types or even calling conventions ... well end up with code that at best won't link, at worst links but does weird ... Sockets in gfortran? - comp.lang.fortran* Specific to system Fortran and C calling conventions. * * Version for MS Windows MS C and ... It is more or less working now with gfortran/gcc, but I ran into an odd ... Why gcc translate a c program into assemble as follow - comp.lang ...... no one has ever produced a completely satisfactory explanation of this odd ... into assembly ... the GNU C Compiler ... that will follow standard GCC calling conventions. Need a FORTRAN compiler for Win7 (or XP) - comp.lang.fortran ...This could be a mess if some of the OCR inserts really odd characters for ... with different storage formats (binary and decimal), and different calling conventions. CPUID and number of cores - comp.lang.asm.x86... edi], ecx mov edi, d mov [edi], edx } } and that is how i read number of cpu cores: cpuid(0x4, &eax, &ebx, &ecx, &edx); cpu->CORE_NUM = (eax>>26) + 1; But funny ... input & output in assembly - comp.lang.asm.x86I was tired at the time, but you must admit it's very odd-looking. My ... is much easier to remember, so if I were to make a case for either calling convention I ... trouble with Windows gfortran - comp.lang.fortran... without a problem (problems that may arise on Windows have to do with the calling conventions ... ... So the as.exe I was picking up previously was ... odd problem with lines ... [bochs][nasm][video memory] - comp.lang.asm.x86Only weird file-systems like windoze-NT and LinDoSwapMix spoil the game :) My ... Bochs, PC Bootstrap, and GCC Calling Conventions Bochs, PC Bootstrap, and GCC Calling ... size of a derived type containing pointers... - comp.lang.fortran ...Funny. I've seen implementations come and go now, well, at least go. I ... It takes one instruction (a return) for some calling conventions on some processors. FTGL font engine - comp.graphics.api.openglI'm simply calling the relevant FTGL API calls (which ... feeder had this brilliant > "idea" to keep this convention ... The odd thing is, if I render text out to the same ... Function : Output argument "out" (and maybe others) not assigned ...Function : Output argument "out" (and maybe others) not assigned during call to Follow Automatic Yahoo!mail or Gmail login by Javascript - comp.lang ...WOuld you perhaps also know a funny and short solution ... domain, then I can't gain access (from the calling .wsh ... who is responding to me, then I follow current convention ... top 10 uses for random data compression?? anyone? - comp ...No exciting greenhouses are delicious and other funny ... I was calling nonsenses to statutory Murad, who's ... who's yielding in addition to the swimming's convention. [comp.publish.cdrom] CD-Recordable FAQ, Part 1/4 - comp.publish ...Archive-name: cdrom/cd-recordable/part1 Posting-Frequency: monthly Last-modified: 2008/10/09 Version: 2.71 Send corrections and updates to And... Calling Conventions Demystified - CodeProjectThe Weird & The Wonderful; The Soapbox; Press Releases; Who's Who; Most Valuable ... reference, you probably found out that these specifiers specify the calling conventions for ... c++ - Weird MSC 8.0 error: "The value of ESP was not properly ...Silencing the check is not the right solution. You have to figure out what is messed up with your calling conventions. There are quite a few ways to change the ... 7/9/2012 6:26:49 PM
|