f



just what is micro-code anyway?

Part of the discussion (in a different thread) about micro- vs 
macro-code entails definitions. Macro-code appears unambiguous: it's 
what's in the load module. And micro-code must be any representation of 
the program different from the macro-code that appears between 
macro-code and FU. Right?

Well...

Mill load modules need not contain program code in executable form. 
Instead they contain genAsm, an abstract member-independent 
representation that is roughly similar to compiler intermediate form. 
They may, but need not, contain the cached result of specializing that 
code to the conAsm of one or more particular target members. Is genAsm 
macro-code? Is conAsm? Should we say that macro-code is what appear in 
memory, after any load-time specialization or other manipulation? But 
then what about trace caches?

Surely micro-code is easier to define: it is the final representation of 
code before execution, if that is different from macro-code. But take 
the Mill again: our code is very wide and so we see a ton of one 
instruction loops. The decoder turns our macro-code into signal sets in 
the usual way, but the hardware notices that it is in a one-instruction 
loop and saves the sigsets and reissues them as a power-saving measure, 
rather than re-cracking the instruction every cycle. Is a saved sigset 
microcode? The content *is* rather similar to horizontal micro- ... And 
there is nothing to stop us handling two- or three- instruction loops 
similarly.

Justice Potter Stewart's rule applies :-)
0
Ivan
12/10/2016 5:57:00 PM
comp.arch 7611 articles. 0 followers. carchreader (32) is leader. Post Follow

40 Replies
922 Views

Similar Articles

[PageSpeed] 54

On Saturday, December 10, 2016 at 10:56:58 AM UTC-7, Ivan Godard wrote:
> And micro-code must be any representation of 
> the program different from the macro-code that appears between 
> macro-code and FU. Right?

Wrong.

Some modern CPUs convert code into micro-ops before execution. But micro-ops aren't microcode.

And then there's nanocode and millicode.

Nanocode is a form of microcode.

Here we go...

A CPU is considered to be "microcoded" when that CPU is not capable of directly 
executing programs in the form described to the assembler-language programmer.

Instead, the CPU is designed to execute programs - typically located in a 
special internal high-speed memory, although they _can_ be placed in regular 
main memory as well - in a different language called "microcode". The 
microprogram includes the loop which carries out fetching and decoding normal 
program instructions; it is an *interpreter* which executes the regular program.

Hence, microcode is not a _representation_ of the program.

Nanocode refers to the case where a very simple CPU runs a microprogram which 
interprets a second-level microprogram (this time in microcode) which executes 
the nominal program. The CPU in the IBM 5100 is an example of a processor which 
had both nanocode and microcode.

Millicode is a language, simpler than the machine's official instruction set, 
that has the same basic instruction formats as that instruction set. Millicode 
is not microcode, because here no interpreter is running, but it isn't a 
_representation_ of the program either.

Here, what happens is that millicode is the computer's "real" native 
instruction set, but a flag can be turned on or off...

if the flag is on, the lower-level millicode instructions (if any) are not 
available, but higher-level machine code instructions are trapped to interrupt 
routines. Those routines, written in millicode, simulate the missing 
instructions.

And then there are micro-ops.

In that case, the program _still_ doesn't really get represented as a whole in 
micro-ops. For example, the micro-op for a "jump" instruction still contains 
the address of a regular instruction, not the address of a micro-op.

So an instruction like ADD 2,X gets turned into two micro-ops: LOAD 101,X; ADD 
2,101... and then these micro-ops get fed to the internal pipeline _on the 
fly_. There may be a micro-op cache, though.

So machines that *do* transform a program into a program in another, simpler, 
machine language - such as the Transmeta Crusoe - are *unconventional* 
architectures, and the intermediate language they use is not termed 
"microcode", "millicode", "nanocode", or "micro-ops".

Thus, although you've pointed out, for no doubt very good reason, that the Mill 
is really not all that similar to the Transmeta Crusoe, in at least this 
respect, the Mill and the Transmeta Crusoe both belong to the same special 
category of machines which uses something very different from what is generally 
regarded as microcode.

John Savard
0
Quadibloc
12/10/2016 6:15:43 PM
On 12/10/2016 10:15 AM, Quadibloc wrote:
> On Saturday, December 10, 2016 at 10:56:58 AM UTC-7, Ivan Godard wrote:
>> And micro-code must be any representation of
>> the program different from the macro-code that appears between
>> macro-code and FU. Right?
>
> Wrong.
>
> Some modern CPUs convert code into micro-ops before execution. But micro-ops aren't microcode.
>
> And then there's nanocode and millicode.
>
> Nanocode is a form of microcode.
>
> Here we go...
>
> A CPU is considered to be "microcoded" when that CPU is not capable of directly
> executing programs in the form described to the assembler-language programmer.
>
> Instead, the CPU is designed to execute programs - typically located in a
> special internal high-speed memory, although they _can_ be placed in regular
> main memory as well - in a different language called "microcode". The
> microprogram includes the loop which carries out fetching and decoding normal
> program instructions; it is an *interpreter* which executes the regular program.
>
> Hence, microcode is not a _representation_ of the program.
>
> Nanocode refers to the case where a very simple CPU runs a microprogram which
> interprets a second-level microprogram (this time in microcode) which executes
> the nominal program. The CPU in the IBM 5100 is an example of a processor which
> had both nanocode and microcode.
>
> Millicode is a language, simpler than the machine's official instruction set,
> that has the same basic instruction formats as that instruction set. Millicode
> is not microcode, because here no interpreter is running, but it isn't a
> _representation_ of the program either.
>
> Here, what happens is that millicode is the computer's "real" native
> instruction set, but a flag can be turned on or off...
>
> if the flag is on, the lower-level millicode instructions (if any) are not
> available, but higher-level machine code instructions are trapped to interrupt
> routines. Those routines, written in millicode, simulate the missing
> instructions.
>
> And then there are micro-ops.
>
> In that case, the program _still_ doesn't really get represented as a whole in
> micro-ops. For example, the micro-op for a "jump" instruction still contains
> the address of a regular instruction, not the address of a micro-op.
>
> So an instruction like ADD 2,X gets turned into two micro-ops: LOAD 101,X; ADD
> 2,101... and then these micro-ops get fed to the internal pipeline _on the
> fly_. There may be a micro-op cache, though.
>
> So machines that *do* transform a program into a program in another, simpler,
> machine language - such as the Transmeta Crusoe - are *unconventional*
> architectures, and the intermediate language they use is not termed
> "microcode", "millicode", "nanocode", or "micro-ops".
>
> Thus, although you've pointed out, for no doubt very good reason, that the Mill
> is really not all that similar to the Transmeta Crusoe, in at least this
> respect, the Mill and the Transmeta Crusoe both belong to the same special
> category of machines which uses something very different from what is generally
> regarded as microcode.
>
> John Savard
>


You have successfully converted an ambiguity in the term "microcoded" 
into an ambiguity in the term "directly executed" :-)

BTW, some micro machines are interpreters as you describe, but not all. 
The macro-loop can be a separate FSM, and usually is when macro parse is 
complicated.

And w/r/t micro vs macro, the Mill is not unconventional at all: it has 
only one level of execution, and directly executes macrocode without 
further translation. Quite reactionary, really: I guess that the Z-80, 
6502, and us are the only ones left :-)
0
Ivan
12/10/2016 7:55:16 PM
On Saturday, December 10, 2016 at 2:55:16 PM UTC-5, Ivan Godard wrote:
> And w/r/t micro vs macro, the Mill is not unconventional at all: it has 
> only one level of execution, and directly executes macrocode without 
> further translation. Quite reactionary, really: I guess that the Z-80, 
> 6502, and us are the only ones left :-)

My plans for Arxoda are to be macrocode (at least in version 1.0).

Best regards,
Rick C. Hodgin
0
Rick
12/10/2016 8:06:22 PM
>Mill load modules need not contain program code in executable form. 
>Instead they contain genAsm, an abstract member-independent 
>representation that is roughly similar to compiler intermediate form. 
>They may, but need not, contain the cached result of specializing that 
>code to the conAsm of one or more particular target members.

OK, sounds a lot like S/38 which is not a bad thing.

I agree with people who don't find the distinction among microcode and
other kinds of code very useful these days.  There was a time when
logic and RAM were expensive and ROMs were a lot faster than RAM, so a
design that executed several microinstructions to interpret each
regular instruction could keep the RAM running at full speed while
decreasing the amount of logic required.  But that was then.

Perhaps you could say there's the published instruction set, the one
that you promise will still work on the next version of the system so
you don't have to recompile, and anything below that is microcode if
that's what you want to call it.

R's,
John
0
John
12/10/2016 8:13:54 PM
On Saturday, December 10, 2016 at 10:55:16 PM UTC+3, Ivan Godard wrote:
> On 12/10/2016 10:15 AM, Quadibloc wrote:
> > On Saturday, December 10, 2016 at 10:56:58 AM UTC-7, Ivan Godard wrote:
> >> And micro-code must be any representation of
> >> the program different from the macro-code that appears between
> >> macro-code and FU. Right?
> >
> > Wrong.
> >
> > Some modern CPUs convert code into micro-ops before execution. But micr=
o-ops aren't microcode.
> >
> > And then there's nanocode and millicode.
> >
> > Nanocode is a form of microcode.
> >
> > Here we go...
> >
> > A CPU is considered to be "microcoded" when that CPU is not capable of =
directly
> > executing programs in the form described to the assembler-language prog=
rammer.
> >
> > Instead, the CPU is designed to execute programs - typically located in=
 a
> > special internal high-speed memory, although they _can_ be placed in re=
gular
> > main memory as well - in a different language called "microcode". The
> > microprogram includes the loop which carries out fetching and decoding =
normal
> > program instructions; it is an *interpreter* which executes the regular=
 program.
> >
> > Hence, microcode is not a _representation_ of the program.
> >
> > Nanocode refers to the case where a very simple CPU runs a microprogram=
 which
> > interprets a second-level microprogram (this time in microcode) which e=
xecutes
> > the nominal program. The CPU in the IBM 5100 is an example of a process=
or which
> > had both nanocode and microcode.
> >
> > Millicode is a language, simpler than the machine's official instructio=
n set,
> > that has the same basic instruction formats as that instruction set. Mi=
llicode
> > is not microcode, because here no interpreter is running, but it isn't =
a
> > _representation_ of the program either.
> >
> > Here, what happens is that millicode is the computer's "real" native
> > instruction set, but a flag can be turned on or off...
> >
> > if the flag is on, the lower-level millicode instructions (if any) are =
not
> > available, but higher-level machine code instructions are trapped to in=
terrupt
> > routines. Those routines, written in millicode, simulate the missing
> > instructions.
> >
> > And then there are micro-ops.
> >
> > In that case, the program _still_ doesn't really get represented as a w=
hole in
> > micro-ops. For example, the micro-op for a "jump" instruction still con=
tains
> > the address of a regular instruction, not the address of a micro-op.
> >
> > So an instruction like ADD 2,X gets turned into two micro-ops: LOAD 101=
,X; ADD
> > 2,101... and then these micro-ops get fed to the internal pipeline _on =
the
> > fly_. There may be a micro-op cache, though.
> >
> > So machines that *do* transform a program into a program in another, si=
mpler,
> > machine language - such as the Transmeta Crusoe - are *unconventional*
> > architectures, and the intermediate language they use is not termed
> > "microcode", "millicode", "nanocode", or "micro-ops".
> >
> > Thus, although you've pointed out, for no doubt very good reason, that =
the Mill
> > is really not all that similar to the Transmeta Crusoe, in at least thi=
s
> > respect, the Mill and the Transmeta Crusoe both belong to the same spec=
ial
> > category of machines which uses something very different from what is g=
enerally
> > regarded as microcode.
> >
> > John Savard
> >
>=20
>=20
> You have successfully converted an ambiguity in the term "microcoded"=20
> into an ambiguity in the term "directly executed" :-)
>=20
> BTW, some micro machines are interpreters as you describe, but not all.=
=20
> The macro-loop can be a separate FSM, and usually is when macro parse is=
=20
> complicated.
>=20
> And w/r/t micro vs macro, the Mill is not unconventional at all: it has=
=20
> only one level of execution, and directly executes macrocode without=20
> further translation. Quite reactionary, really: I guess that the Z-80,=20
> 6502, and us are the only ones left :-)

Well...

I'm not sure how the Z80 works. I know it takes a heck of a lot of clock cy=
cles to do simple things (as do the 8080 and 8086).

How the 6502 works is well documented.

It has a 130 x 21 bit PLA. That is, 21 inputs, and 130 outputs.

The 21 input bits are:

- the top 6 bits of the instruction opcode
- the top 6 bits of the instruction opcode, inverted
- instruction bit 1
- instruction bit 0
- !instruction bit 1 & !instruction bit 0
- one of six bits set to 1 to indicate the current clock cycle in the instr=
uction from 0 to 5, the others zero

Each of the 130 lines of the array contains a hardwired 0 or 1. Each bit is=
 OR'd with the corresponding input. If all bits in a line are 1 (i.e. AND'd=
) then the line has a match and the output signal is a 1. Each one controls=
 things such as dumping the contents of the register onto the bus, loading =
a register from the bus, incrementing the PC, resetting the cycle count shi=
ft register (instruction finished) etc.

If the ROM content for an opcode bit and it's inverse are both 1 then both =
ORs will produce 1 and that bit position will be ignored for this control l=
ine. If one ROM bit is 1 and the 0 is zero then that control line can match=
 only if that opcode bit is set, or only if that opcode bit is cleared. The=
 ROM can never have both those bits be zero, or that control line can never=
 be triggered.

The ROM contents for all clock cycle inputs should be 1, except for the inp=
ut of the clock cycle you want to match.

For any given opcode and clock cycle, several control lines can fire.


I'm not sure how to describe this. Is it not microcode? It's got a micro-PC=
, which can hold microcode addresses from 0 to 5, and can be reset to 0 at =
any point. So each instruction consists of a micro-program with from 1 to 6=
 micro instructions (2 to 6, actually -- even instructions consisting only =
of the opcode byte and that dont' touch memory (such as NOP, CLX, CLY, tran=
sfer between registers) take 2 clock cycles.

It's just that the potentially 1.5k different micro-instructions have been =
rather cleverly compressed into 130 patterns. (In fact to less than that, s=
ince there are a few duplicated lines --- perhaps for signal routing or fan=
-out reasons)
0
Bruce
12/10/2016 8:54:12 PM
On 12/10/2016 12:54 PM, Bruce Hoult wrote:
> On Saturday, December 10, 2016 at 10:55:16 PM UTC+3, Ivan Godard
> wrote:
>> On 12/10/2016 10:15 AM, Quadibloc wrote:
>>> On Saturday, December 10, 2016 at 10:56:58 AM UTC-7, Ivan Godard
>>> wrote:
>>>> And micro-code must be any representation of the program
>>>> different from the macro-code that appears between macro-code
>>>> and FU. Right?
>>>
>>> Wrong.
>>>
>>> Some modern CPUs convert code into micro-ops before execution.
>>> But micro-ops aren't microcode.
>>>
>>> And then there's nanocode and millicode.
>>>
>>> Nanocode is a form of microcode.
>>>
>>> Here we go...
>>>
>>> A CPU is considered to be "microcoded" when that CPU is not
>>> capable of directly executing programs in the form described to
>>> the assembler-language programmer.
>>>
>>> Instead, the CPU is designed to execute programs - typically
>>> located in a special internal high-speed memory, although they
>>> _can_ be placed in regular main memory as well - in a different
>>> language called "microcode". The microprogram includes the loop
>>> which carries out fetching and decoding normal program
>>> instructions; it is an *interpreter* which executes the regular
>>> program.
>>>
>>> Hence, microcode is not a _representation_ of the program.
>>>
>>> Nanocode refers to the case where a very simple CPU runs a
>>> microprogram which interprets a second-level microprogram (this
>>> time in microcode) which executes the nominal program. The CPU in
>>> the IBM 5100 is an example of a processor which had both nanocode
>>> and microcode.
>>>
>>> Millicode is a language, simpler than the machine's official
>>> instruction set, that has the same basic instruction formats as
>>> that instruction set. Millicode is not microcode, because here no
>>> interpreter is running, but it isn't a _representation_ of the
>>> program either.
>>>
>>> Here, what happens is that millicode is the computer's "real"
>>> native instruction set, but a flag can be turned on or off...
>>>
>>> if the flag is on, the lower-level millicode instructions (if
>>> any) are not available, but higher-level machine code
>>> instructions are trapped to interrupt routines. Those routines,
>>> written in millicode, simulate the missing instructions.
>>>
>>> And then there are micro-ops.
>>>
>>> In that case, the program _still_ doesn't really get represented
>>> as a whole in micro-ops. For example, the micro-op for a "jump"
>>> instruction still contains the address of a regular instruction,
>>> not the address of a micro-op.
>>>
>>> So an instruction like ADD 2,X gets turned into two micro-ops:
>>> LOAD 101,X; ADD 2,101... and then these micro-ops get fed to the
>>> internal pipeline _on the fly_. There may be a micro-op cache,
>>> though.
>>>
>>> So machines that *do* transform a program into a program in
>>> another, simpler, machine language - such as the Transmeta Crusoe
>>> - are *unconventional* architectures, and the intermediate
>>> language they use is not termed "microcode", "millicode",
>>> "nanocode", or "micro-ops".
>>>
>>> Thus, although you've pointed out, for no doubt very good reason,
>>> that the Mill is really not all that similar to the Transmeta
>>> Crusoe, in at least this respect, the Mill and the Transmeta
>>> Crusoe both belong to the same special category of machines which
>>> uses something very different from what is generally regarded as
>>> microcode.
>>>
>>> John Savard
>>>
>>
>>
>> You have successfully converted an ambiguity in the term
>> "microcoded" into an ambiguity in the term "directly executed" :-)
>>
>> BTW, some micro machines are interpreters as you describe, but not
>> all. The macro-loop can be a separate FSM, and usually is when
>> macro parse is complicated.
>>
>> And w/r/t micro vs macro, the Mill is not unconventional at all: it
>> has only one level of execution, and directly executes macrocode
>> without further translation. Quite reactionary, really: I guess
>> that the Z-80, 6502, and us are the only ones left :-)
>
> Well...
>
> I'm not sure how the Z80 works. I know it takes a heck of a lot of
> clock cycles to do simple things (as do the 8080 and 8086).
>
> How the 6502 works is well documented.
>
> It has a 130 x 21 bit PLA. That is, 21 inputs, and 130 outputs.
>
> The 21 input bits are:
>
> - the top 6 bits of the instruction opcode - the top 6 bits of the
> instruction opcode, inverted - instruction bit 1 - instruction bit 0
> - !instruction bit 1 & !instruction bit 0 - one of six bits set to 1
> to indicate the current clock cycle in the instruction from 0 to 5,
> the others zero
>
> Each of the 130 lines of the array contains a hardwired 0 or 1. Each
> bit is OR'd with the corresponding input. If all bits in a line are 1
> (i.e. AND'd) then the line has a match and the output signal is a 1.
> Each one controls things such as dumping the contents of the register
> onto the bus, loading a register from the bus, incrementing the PC,
> resetting the cycle count shift register (instruction finished) etc.
>
> If the ROM content for an opcode bit and it's inverse are both 1 then
> both ORs will produce 1 and that bit position will be ignored for
> this control line. If one ROM bit is 1 and the 0 is zero then that
> control line can match only if that opcode bit is set, or only if
> that opcode bit is cleared. The ROM can never have both those bits be
> zero, or that control line can never be triggered.
>
> The ROM contents for all clock cycle inputs should be 1, except for
> the input of the clock cycle you want to match.
>
> For any given opcode and clock cycle, several control lines can
> fire.
>
>
> I'm not sure how to describe this. Is it not microcode? It's got a
> micro-PC, which can hold microcode addresses from 0 to 5, and can be
> reset to 0 at any point. So each instruction consists of a
> micro-program with from 1 to 6 micro instructions (2 to 6, actually
> -- even instructions consisting only of the opcode byte and that
> dont' touch memory (such as NOP, CLX, CLY, transfer between
> registers) take 2 clock cycles.
>
> It's just that the potentially 1.5k different micro-instructions have
> been rather cleverly compressed into 130 patterns. (In fact to less
> than that, since there are a few duplicated lines --- perhaps for
> signal routing or fan-out reasons)
>

Thank you! I've coded for a 6502, but never knew it was this way inside. 
I withdraw my example :-)
0
Ivan
12/10/2016 9:24:39 PM
Ivan Godard <ivan@millcomputing.com> writes:
>> Nanocode refers to the case where a very simple CPU runs a microprogram which
>> interprets a second-level microprogram (this time in microcode) which executes
>> the nominal program. The CPU in the IBM 5100 is an example of a processor which
>> had both nanocode and microcode.

following is account of 3081 was actually the 370 simulator from the
Future System project (i.e. 1st half 70s, FS was completely different
from 370 and was going to completely replace it). During the FS
period, 370 projects were being killed off (credited with giving
clone processor makers market foothold). When FS imploded (w/o even
being announced), there was made rush to get stuff back into
the 370 product pipelines ... kicking off Q&D 303x and 3081 efforts
in parallel:
http://www.jfsowa.com/computer/memo125.htm

note that 370/158 was horizontal microcode, split between the 370
emulator microcode and "integrated channel" microcode.  For 303x, an
external "channel director" was created with a 370/158 engine running
the "integrated channel" microcode (w/o the 370 emulation). A 3031 is
then 370/158 engine (with 370 emulation) and a 2nd 370/158 engine with
the integrated channel microcode (and no 370 emulation). A cpu intensive
benchmark ... doing no I/O, was still about 1/4th-1/3rd faster on 3031
than 370/158 ... since the integrated channel microcode was still using
cycles even when no I/O was going on.

circa 1980, there was effort to replace the large number of internal
microprocessors (low-end/mid-range 370s, controllers, as/400 followon to
s/38, etc) with 801/risc Iliad. For various reasons the efforts
floundered/canceled ... resulting in some number of risc engineers
leaving for other vendors.

The followons to the 4331&4341 were going to 801/risc Iliad with 370
emulation "microcode" implemented in 801. Contributing to killing that
effort was white paper showing that for 4341 followon, the majority of
370 could be directly implemented in CISC silicon, having much better
price/performance than Iliad (disclaimer: I contributed to that white
paper).

As an aside, the 370 "microcode" implementation on 801/risc Iliad
.... looked somewhat like Hercules
http://www.hercules-390.org/

however, there was effort to look at doing JIT 370 code snipet
translation to native 801/risc Iliad. In the 90s, some of the commerical
370 emulators (running on sparc & intel platforms) did implement JIT 370
code snipet translation to native.

-- 
virtualization experience starting Jan1968, online at home since Mar1970
0
Anne
12/10/2016 10:20:48 PM
On Saturday, December 10, 2016 at 1:13:50 PM UTC-7, John Levine wrote:

> I agree with people who don't find the distinction among microcode and
> other kinds of code very useful these days.  There was a time when
> logic and RAM were expensive and ROMs were a lot faster than RAM, so a
> design that executed several microinstructions to interpret each
> regular instruction could keep the RAM running at full speed while
> decreasing the amount of logic required.  But that was then.

You have a good point, which I totally ignored in my post. The fact that mi=
crocode is merely a historical curiosity these days, not something likely t=
o be used in implementing real systems (at least, not quite in the same way=
 as they did in those days) was something I wasn't concerned with... I just=
 wanted to make it clear what the term "microcode" meant, at least in its o=
riginal historical context.

So that Ivan Godard wouldn't confuse people, or make it sound like the Mill=
 would be horrendously inefficient, being built on a slow, old, and obsolet=
e technology.

I'm not saying the Mill has such faults, but rather that this is what the w=
ord=20
"microcode" brings to mind. I wish I remembered offhand the right word for =
what=20
the Transmeta Crusoe and the System/38 did... "dynamic code translation" is=
 a=20
buzz phrase that might be used.

Given that VLIW is potentially efficient, but not very good for statically=
=20
generating programs in, what about dynamic code translation to VLIW? I susp=
ect=20
that doing so - at least to conventional VLIW (so to the extent the Mill is=
=20
like this, it's not an applicable criticism) - is a bad idea, because to ma=
ke=20
the VLIW efficient, you have to mix together conventional instructions from=
=20
different threads to make your VLIW utilizing your processor fully...

which is a security nightmare.

And so one of the most tempting avenues for exploring dynamic code generati=
on=20
discourages people - one reason why this hasn't been done much. The inheren=
t=20
overhead of the translation and so on also means that there have to be real=
=20
benefits instead of tiny ones... and most people who've looked at the idea=
=20
hadn't found ways to get sufficient benefits, I suspect. So the Mill may we=
ll=20
be a wonderful thing, having succeeded where others failed in a little-expl=
ored=20
area of design space.

John Savard
0
Quadibloc
12/10/2016 11:37:08 PM
On Saturday, December 10, 2016 at 11:56:58 AM UTC-6, Ivan Godard wrote:
> Part of the discussion (in a different thread) about micro- vs 
> macro-code entails definitions. Macro-code appears unambiguous: it's 
> what's in the load module. And micro-code must be any representation of 
> the program different from the macro-code that appears between 
> macro-code and FU. Right?

A macroprogrammed computer has an instruction pointer (or program counter.)
This instruction pointer is used to fetch instructions from the memory 
hierarchy.

A microprogrammed computer has an instruction pointer for instructions
and a second instruction pointer for microcode. This second instruction
pointer is used to read out microcode from micro store and then use this
to control calculations and to sequence operations.

Mitch
0
MitchAlsup
12/11/2016 12:39:12 AM
On Saturday, December 10, 2016 at 7:39:15 PM UTC-5, MitchAlsup wrote:
> On Saturday, December 10, 2016 at 11:56:58 AM UTC-6, Ivan Godard wrote:
> > Part of the discussion (in a different thread) about micro- vs 
> > macro-code entails definitions. Macro-code appears unambiguous: it's 
> > what's in the load module. And micro-code must be any representation of 
> > the program different from the macro-code that appears between 
> > macro-code and FU. Right?
> 
> A macroprogrammed computer has an instruction pointer (or program counter.)
> This instruction pointer is used to fetch instructions from the memory 
> hierarchy.
> 
> A microprogrammed computer has an instruction pointer for instructions
> and a second instruction pointer for microcode. This second instruction
> pointer is used to read out microcode from micro store and then use this
> to control calculations and to sequence operations.

How are pipeline stages considered?  If the instruction is partially
decoded in stage 1, and further decoded in stage 2, then from stage 3
on to the final stage that operation goes along with its data so that
there is a consistent "production line" of processing so the final
form arrives at the end... are those pipeline stage operations
considered microcode?  Or are they just the mechanisms required to
conduct the operation specified in the original instruction?

-----
Arxoda uses a six-stage pipeline:

    https://github.com/RickCHodgin/libsf/blob/master/arxoda/oppie/oppie-6.png

    (1) i-fetch / partial decode
    (2) i-decode
    (3) d-fetch
    (4) scheduler
    (5) operation
    (6) d-write

My design intends to decode the instruction partially in (1) and mostly
in (2), but from that point forward there is a "package of operation"
that is sent through the core to each stage.

Best regards,
Rick C. Hodgin
0
Rick
12/11/2016 1:43:11 AM
On 12/10/2016 4:39 PM, MitchAlsup wrote:
> On Saturday, December 10, 2016 at 11:56:58 AM UTC-6, Ivan Godard wrote:
>> Part of the discussion (in a different thread) about micro- vs
>> macro-code entails definitions. Macro-code appears unambiguous: it's
>> what's in the load module. And micro-code must be any representation of
>> the program different from the macro-code that appears between
>> macro-code and FU. Right?
>
> A macroprogrammed computer has an instruction pointer (or program counter.)
> This instruction pointer is used to fetch instructions from the memory
> hierarchy.
>
> A microprogrammed computer has an instruction pointer for instructions
> and a second instruction pointer for microcode. This second instruction
> pointer is used to read out microcode from micro store and then use this
> to control calculations and to sequence operations.
>
> Mitch
>

Some macroprogrammed computers have two instruction pointers :-)

I think some microprogrammed machines have more than two instruction 
pointers, too.
0
Ivan
12/11/2016 2:29:50 AM
>Given that VLIW is potentially efficient, but not very good for statically 
>generating programs in, what about dynamic code translation to VLIW? I suspect 
>that doing so - at least to conventional VLIW (so to the extent the Mill is 
>like this, it's not an applicable criticism) - is a bad idea, because to make 
>the VLIW efficient, you have to mix together conventional instructions from 
>different threads to make your VLIW utilizing your processor fully...

Ugh.  The plan with VLIW was that the brilliant compiler would figure
out all of the data hazards and schedule the code appropriately.  But
it turned out that in most environments too many of the hazards depend
on the data so you need to do it dynamically.  VLIW compilers have
been pretty aggressive, and I'd be surprised if there were enough
extra data about the hazards to make VLIW work well.  

R's,
John


0
John
12/11/2016 3:57:44 AM
On Sat, 10 Dec 2016 09:57:00 -0800, Ivan Godard
<ivan@millcomputing.com> wrote:

>Part of the discussion (in a different thread) about micro- vs 
>macro-code entails definitions. Macro-code appears unambiguous: it's 
>what's in the load module. And micro-code must be any representation of 
>the program different from the macro-code that appears between 
>macro-code and FU. Right?
>
>Well...
>
>Mill load modules need not contain program code in executable form. 
>Instead they contain genAsm, an abstract member-independent 
>representation that is roughly similar to compiler intermediate form. 
>They may, but need not, contain the cached result of specializing that 
>code to the conAsm of one or more particular target members. Is genAsm 
>macro-code? Is conAsm? Should we say that macro-code is what appear in 
>memory, after any load-time specialization or other manipulation? But 
>then what about trace caches?
>
>Surely micro-code is easier to define: it is the final representation of 
>code before execution, if that is different from macro-code. But take 
>the Mill again: our code is very wide and so we see a ton of one 
>instruction loops. The decoder turns our macro-code into signal sets in 
>the usual way, but the hardware notices that it is in a one-instruction 
>loop and saves the sigsets and reissues them as a power-saving measure, 
>rather than re-cracking the instruction every cycle. Is a saved sigset 
>microcode? The content *is* rather similar to horizontal micro- ... And 
>there is nothing to stop us handling two- or three- instruction loops 
>similarly.
>
>Justice Potter Stewart's rule applies :-)


Didn't we decide during the interminable "is PALcode microcode?"
debates a couple of decade ago, that it's basically in the eye of the
beholder?

There are very few implementations these days of what we'd have
considered traditional 60s/70s microcode, with a microcode sequencer
bouncing through micro-ops.

And what people have called microcode in the last few decades is just
all over the map.

At the en of the day, there's the architected ISA, which compilers and
assembler programmers will get to see, and then there's everything
under that (which is where I'd place the Mill's specializer), and
whatever you want to call that is of interest only to people with
access to the innards of the machines, and the academic papers you
write on the subject.  Just tell us what you meant somewhere at the
top.  ;-)

And yes, that distinction would apply to something like JBC (ISA), and
the VM/JIT/whatever the microcode/millicode/PALcode/LIC/PFM, or
whatever you want to call it this week.  And I don't have a problem
with that.
0
Robert
12/11/2016 8:39:14 AM
On 12/10/2016 7:57 PM, John Levine wrote:
>> Given that VLIW is potentially efficient, but not very good for statically
>> generating programs in, what about dynamic code translation to VLIW? I suspect
>> that doing so - at least to conventional VLIW (so to the extent the Mill is
>> like this, it's not an applicable criticism) - is a bad idea, because to make
>> the VLIW efficient, you have to mix together conventional instructions from
>> different threads to make your VLIW utilizing your processor fully...
>
> Ugh.  The plan with VLIW was that the brilliant compiler would figure
> out all of the data hazards and schedule the code appropriately.  But
> it turned out that in most environments too many of the hazards depend
> on the data so you need to do it dynamically.  VLIW compilers have
> been pretty aggressive, and I'd be surprised if there were enough
> extra data about the hazards to make VLIW work well.
>
> R's,
> John
>
>

That's a problem with all wide-issue designs, not just VLIW. Even if the 
hardware interlocks the hazards the compiler must be hazard-aware if you 
are to get decent performance. Same issue with in-order superscalars too.

Of course, there is a simple solution: have no hazards. You could design 
a hazard-free VLIW just as we have designed a hazard-free Mill.


0
Ivan
12/11/2016 1:18:45 PM
On Saturday, December 10, 2016 at 8:29:48 PM UTC-6, Ivan Godard wrote:
> On 12/10/2016 4:39 PM, MitchAlsup wrote:
> > On Saturday, December 10, 2016 at 11:56:58 AM UTC-6, Ivan Godard wrote:
> >> Part of the discussion (in a different thread) about micro- vs
> >> macro-code entails definitions. Macro-code appears unambiguous: it's
> >> what's in the load module. And micro-code must be any representation o=
f
> >> the program different from the macro-code that appears between
> >> macro-code and FU. Right?
> >
> > A macroprogrammed computer has an instruction pointer (or program count=
er.)
> > This instruction pointer is used to fetch instructions from the memory
> > hierarchy.
> >
> > A microprogrammed computer has an instruction pointer for instructions
> > and a second instruction pointer for microcode. This second instruction
> > pointer is used to read out microcode from micro store and then use thi=
s
> > to control calculations and to sequence operations.
> >
> > Mitch
> >
>=20
> Some macroprogrammed computers have two instruction pointers :-)
>=20
> I think some microprogrammed machines have more than two instruction=20
> pointers, too.

The 68020 microPC was "decoded" into the microcode ROM using a PLA.
The PLA would "fire" multiple word-lines, sense amps would OR bit lines.
This allowed a single bit pattern to contain 3 different =C2=B5PCs driving
the Addr-side, the Data-side, and the Flow-sides simultaneously.

Mitch
0
MitchAlsup
12/11/2016 5:18:21 PM
On 12/11/2016 9:18 AM, MitchAlsup wrote:
> On Saturday, December 10, 2016 at 8:29:48 PM UTC-6, Ivan Godard wrote:


>> I think some microprogrammed machines have more than two instruction
>> pointers, too.
>
> The 68020 microPC was "decoded" into the microcode ROM using a PLA.
> The PLA would "fire" multiple word-lines, sense amps would OR bit lines.
> This allowed a single bit pattern to contain 3 different µPCs driving
> the Addr-side, the Data-side, and the Flow-sides simultaneously.
>
> Mitch
>

But that's for ops with linear microcode that don't really need a 
micro-PC. Some ops require non-linear microcode that contain micro-loops 
or micro-calls that must retain a micro-PC distinct from the macro-PC. 
Have there ever been machines that could execute two or more such 
micro-routines in parallel, and hence needed two or more micro-PCs to 
track the state of the micro-loops?
0
Ivan
12/11/2016 7:02:36 PM
On 12/11/16 2:02 PM, Ivan Godard wrote:
> On 12/11/2016 9:18 AM, MitchAlsup wrote:
>> On Saturday, December 10, 2016 at 8:29:48 PM UTC-6, Ivan Godard wrote:
>
>
>>> I think some microprogrammed machines have more than two instruction
>>> pointers, too.
>>
>> The 68020 microPC was "decoded" into the microcode ROM using a PLA.
>> The PLA would "fire" multiple word-lines, sense amps would OR bit lines.
>> This allowed a single bit pattern to contain 3 different µPCs driving
>> the Addr-side, the Data-side, and the Flow-sides simultaneously.

> But that's for ops with linear microcode that don't really need a
> micro-PC. Some ops require non-linear microcode that contain micro-loops
> or micro-calls that must retain a micro-PC distinct from the macro-PC.
> Have there ever been machines that could execute two or more such
> micro-routines in parallel, and hence needed two or more micro-PCs to
> track the state of the micro-loops?

I don't know whether the Alto would count, since it didn't exactly have 
a uniquely define non-microcode ISA (you could load the control store 
with emulators for the language/system of your choice). IIRC it had some 
double-digit number of microprogram (ahem) counters that would execute 
according to whose priority was highest. Microcode handled a bunch of 
non-traditionally-cpu stuff as well, including RAM refresh.

paul

0
paul
12/11/2016 7:30:05 PM
On Sunday, December 11, 2016 at 1:39:16 AM UTC-7, robert...@yahoo.com wrote:

> Didn't we decide during the interminable "is PALcode microcode?"
> debates a couple of decade ago, that it's basically in the eye of the
> beholder?

Looking up what PALcode was on the DEC Alpha, it's most closely analogous to 
millicode, but perhaps it's better to think of it as regular machine code in a 
special privileged mode.

It is true that debates over terminology are unproductive. But it's also true 
that when terms have standard meanings, not using them tends to be confusing.

Since "microcode" at least at one point had a very well-defined meaning, using 
the same term for things which share only some of its attributes can be 
confusing. Why not try to use a more accurate term, if one is available?

John Savard
0
Quadibloc
12/11/2016 9:19:05 PM
On Sunday, December 11, 2016 at 1:02:39 PM UTC-6, Ivan Godard wrote:
> On 12/11/2016 9:18 AM, MitchAlsup wrote:
> > On Saturday, December 10, 2016 at 8:29:48 PM UTC-6, Ivan Godard wrote:
>=20
>=20
> >> I think some microprogrammed machines have more than two instruction
> >> pointers, too.
> >
> > The 68020 microPC was "decoded" into the microcode ROM using a PLA.
> > The PLA would "fire" multiple word-lines, sense amps would OR bit lines=
..
> > This allowed a single bit pattern to contain 3 different =C2=B5PCs driv=
ing
> > the Addr-side, the Data-side, and the Flow-sides simultaneously.
> >
> > Mitch
> >
>=20
> But that's for ops with linear microcode that don't really need a=20
> micro-PC. Some ops require non-linear microcode that contain micro-loops=
=20
> or micro-calls that must retain a micro-PC distinct from the macro-PC.=20
> Have there ever been machines that could execute two or more such=20
> micro-routines in parallel, and hence needed two or more micro-PCs to=20
> track the state of the micro-loops?

This is what I was talking about:

1/3rd of the PC was processing the operand =C2=B5stream choosing between
{immediates, registers, memory data}

Another 1/3rd was processing the data calculation (assuming it took
more than 1 cycle of execution) say IDIV.

And the last 1/3rd of the PC was sniffing out exceptions, overflows,
and other special cases left out of HW.
0
MitchAlsup
12/11/2016 11:27:10 PM
On Sunday, December 11, 2016 at 2:19:07 PM UTC-7, Quadibloc wrote:
> Why not try to use a more accurate term, if one is available?

I've just thought of a reason.

A possible more accurate term would be "model-native machine code", with the 
language presented to the programmer described as "hardware-compiled P-code".

But if you use language like *that*, instead of calling the generated code 
"microcode", then you're reinforcing the temptation to bypass the intended 
usage model of the processor, and code directly in the internal machine 
language that's intended to be hidden.

John Savard
0
Quadibloc
12/12/2016 2:48:33 AM
On Sun, 11 Dec 2016 13:19:05 -0800 (PST), Quadibloc
<jsavard@ecn.ab.ca> wrote:

>On Sunday, December 11, 2016 at 1:39:16 AM UTC-7, robert...@yahoo.com wrote:
>
>> Didn't we decide during the interminable "is PALcode microcode?"
>> debates a couple of decade ago, that it's basically in the eye of the
>> beholder?
>
>Looking up what PALcode was on the DEC Alpha, it's most closely analogous to 
>millicode, but perhaps it's better to think of it as regular machine code in a 
>special privileged mode.


IOW, one side of the interminable "is PALcode microcode" debate.


>It is true that debates over terminology are unproductive. But it's also true 
>that when terms have standard meanings, not using them tends to be confusing.
>
>Since "microcode" at least at one point had a very well-defined meaning, using 
>the same term for things which share only some of its attributes can be 
>confusing. Why not try to use a more accurate term, if one is available?


I'd argue that "microcode" hasn't really had a "well defined" meaning
in three decades.  Perhaps that's a loss to the world.  C'est la vie.
0
Robert
12/12/2016 6:18:13 AM
MitchAlsup wrote:
> 
> The 68020 microPC was "decoded" into the microcode ROM using a PLA.
> The PLA would "fire" multiple word-lines, sense amps would OR bit lines.
> This allowed a single bit pattern to contain 3 different µPCs driving
> the Addr-side, the Data-side, and the Flow-sides simultaneously.
> 
> Mitch

Its a bit hard to find non pay-walled details on micros from 1980's.

A blog on internals for 68020 (circa 1984) & 68000 (circa 1979):

   http://blog.ehliar.se/post/80934423744/the-motorola-mc68020

These appear to be the 68020 microsequencer patents:

   Data processor microsequencer having multiple microaddress
   sources and next microaddress source selection, Motorola 1993
   https://www.google.com/patents/US5241637

   Microprogrammed data processor which includes a microsequencer
   in which a next microaddress output of a microROM is connected
   to the or-plane of an entry PLA, Motorola 1993
   https://www.google.com/patents/US5412785

   Data processor having a multi-stage instruction pipe and selection
   logic responsive to an instruction decoder for selecting one stage
   of the instruction pipe, Motorola 1993
   https://www.google.com/patents/US5276824

   Microcode testing of PLA's in a data processor, Motorola 1987
   https://www.google.com/patents/US4745574

This page references a PDF containing part of a 1986
Motorola Technical document discussing the 68020 microsequencer.
Unfortunately it is missing all the details.

   A FAST BRANCH CONTROL SCHEME
   http://priorart.ip.com/IPCOM/000005598

These appear to be the 68000 microsequencer patents:

   Microprogrammed control apparatus having a two-level control
   store for data processor, Motorola 1978
   https://www.google.ca/patents/US4307445

   Two-level control store for microprogrammed data processor, Motorola 1979
   https://www.google.ca/patents/US4325121

   Instruction register sequence decoder for microprogrammed data
   processor and method, Motorola 1979
   https://www.google.ca/patents/US4342078

   Conditional branch unit for microprogrammed data processor, Motorola 1979
   https://www.google.ca/patents/US4338661

Eric



0
EricP
12/12/2016 10:26:55 PM
John Levine wrote:
>> Mill load modules need not contain program code in executable form.
>> Instead they contain genAsm, an abstract member-independent
>> representation that is roughly similar to compiler intermediate form.
>> They may, but need not, contain the cached result of specializing that
>> code to the conAsm of one or more particular target members.
>
> OK, sounds a lot like S/38 which is not a bad thing.
>
> I agree with people who don't find the distinction among microcode and
> other kinds of code very useful these days.  There was a time when
> logic and RAM were expensive and ROMs were a lot faster than RAM, so a
> design that executed several microinstructions to interpret each
> regular instruction could keep the RAM running at full speed while
> decreasing the amount of logic required.  But that was then.
>
> Perhaps you could say there's the published instruction set, the one
> that you promise will still work on the next version of the system so
> you don't have to recompile, and anything below that is microcode if
> that's what you want to call it.

It is a _lot_ closer to modern Android which has an abstract register 
machine VM and does AOT compilation upon first load, including first 
load after an OS update so that any OS update can also contain 
improvements to the AOT compiler.

That is my mental model of the Mill anyway - Ivan might disagree. :-)

Terje


-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
12/13/2016 7:13:42 AM
> Mill load modules need not contain program code in executable form. Instead
> they contain genAsm, an abstract member-independent representation that is
> roughly similar to compiler intermediate form. They may, but need not,

AFAIK from the silicon hardware's point of view, genAsm doesn't exist.

It's all managed by software (IOW, in the OS) on top of the CPU, right?


        Stefan
0
Stefan
12/13/2016 3:32:24 PM
On 12/13/2016 7:32 AM, Stefan Monnier wrote:
>> Mill load modules need not contain program code in executable form. Instead
>> they contain genAsm, an abstract member-independent representation that is
>> roughly similar to compiler intermediate form. They may, but need not,
>
> AFAIK from the silicon hardware's point of view, genAsm doesn't exist.
>
> It's all managed by software (IOW, in the OS) on top of the CPU, right?
>
>
>         Stefan
>

Yes, hardware doesn't know about genAsm.

GenAsm is a text form of genForm, the file format between the clang/LLVM 
compiler and the specializer. Ordinary users will not be concerned about 
genForm/genAsm, although one could write in it if desired. Compiler 
writers, and compiler and specializer maintainers, must be aware of it. 
In this it is similar to other internal representations in tool chains. 
Much the same is true of conAsm, which is a text form of executable 
binary. There really isn't a distinct conForm; it's ELF at that point.


0
Ivan
12/13/2016 5:24:41 PM
Ivan Godard wrote:
> On 12/11/2016 9:18 AM, MitchAlsup wrote:
>> On Saturday, December 10, 2016 at 8:29:48 PM UTC-6, Ivan Godard wrote:
> 
> 
>>> I think some microprogrammed machines have more than two instruction
>>> pointers, too.
>>
>> The 68020 microPC was "decoded" into the microcode ROM using a PLA.
>> The PLA would "fire" multiple word-lines, sense amps would OR bit lines.
>> This allowed a single bit pattern to contain 3 different µPCs driving
>> the Addr-side, the Data-side, and the Flow-sides simultaneously.
>>
>> Mitch
>>
> 
> But that's for ops with linear microcode that don't really need a 
> micro-PC. Some ops require non-linear microcode that contain micro-loops 
> or micro-calls that must retain a micro-PC distinct from the macro-PC. 
> Have there ever been machines that could execute two or more such 
> micro-routines in parallel, and hence needed two or more micro-PCs to 
> track the state of the micro-loops?

I have never looked at the innards of the 8087 FPU coprocessor
but I would guess that it is internally microsequenced rather
than random control logic, given its date of design and
particularly since it supports transcendentals, such that
it operates as a separate parallel hardware "thread".
I vaguely recall one/some PDP-11 had a microsequenced FPU.

But a microsequencer+ROM/PLA is just an implementation detail as
any counter + glue logic can be a parallel hardware state machine.
For example, the MMU virtual address translator and page table walker,
or the instruction prefetch sequencer.

Eric

0
EricP
12/13/2016 5:30:02 PM
EricP wrote:
> 
> I have never looked at the innards of the 8087 FPU coprocessor
> but I would guess that it is internally microsequenced rather
> than random control logic, given its date of design and
> particularly since it supports transcendentals, such that
> it operates as a separate parallel hardware "thread".

This appears to be the 8087 innards patent. Figure 2 at the
bottom shows a microsequencer with u call-return stack.

Numeric data processor, Intel 1980
https://www.google.com/patents/US4338675

Eric



0
EricP
12/13/2016 6:03:08 PM
>> Perhaps you could say there's the published instruction set, the one
>> that you promise will still work on the next version of the system so
>> you don't have to recompile, and anything below that is microcode if
>> that's what you want to call it.
>
>It is a _lot_ closer to modern Android which has an abstract register 
>machine VM and does AOT compilation upon first load, including first 
>load after an OS update so that any OS update can also contain 
>improvements to the AOT compiler.

That's pretty much the same model that Java and .net use -- ship a
virtual binary which you then translate to native code either all at
once or incrementally as code paths are executed.  I know that .net
can do either batch or incremental compilation, haven't kept track
of all the JVMs enough to know which does what.

R's,
John
0
John
12/13/2016 6:39:35 PM
>>> Mill load modules need not contain program code in executable form. Instead
>>> they contain genAsm, an abstract member-independent representation that is
>>> roughly similar to compiler intermediate form. They may, but need not,
>> AFAIK from the silicon hardware's point of view, genAsm doesn't exist.
>> It's all managed by software (IOW, in the OS) on top of the CPU, right?
> Yes, hardware doesn't know about genAsm.
> GenAsm is a text form of genForm, the file format between the clang/LLVM

Sorry, got the names wrong, then.  What I meant is that the hardware
doesn't know about genForm.  All it knows is "conForm" (the machine
language), so from the hardware point of view, conForm is not a kind of
microcode for genForm: it's just the normal machine language.

IIUC, you make no serious attempt to *hide* conForm.  Instead, you make
it sufficiently convenient&efficient to use genForm, that people will
likely not be tempted to use conForm directly too much.  But you don't
rely on conForm being "private" either: if someone wants to use conForm,
all the more power to him, most likely the only thing he'll get from
that is the need to deal with incompatibilities between the different
conForms used in different Mill machines.


        Stefan
0
Stefan
12/13/2016 6:54:36 PM
On Tue, 13 Dec 2016 18:39:35 +0000 (UTC), John Levine <johnl@iecc.com>
wrote:

>>> Perhaps you could say there's the published instruction set, the one
>>> that you promise will still work on the next version of the system so
>>> you don't have to recompile, and anything below that is microcode if
>>> that's what you want to call it.
>>
>>It is a _lot_ closer to modern Android which has an abstract register 
>>machine VM and does AOT compilation upon first load, including first 
>>load after an OS update so that any OS update can also contain 
>>improvements to the AOT compiler.
>
>That's pretty much the same model that Java and .net use -- ship a
>virtual binary which you then translate to native code either all at
>once or incrementally as code paths are executed.  I know that .net
>can do either batch or incremental compilation, haven't kept track
>of all the JVMs enough to know which does what.


FWIW, many JVMs can also do direct interpretation (as opposed to any
sort of compilation), although the idea would be to do that only in
limited circumstances.  That's not really an option for .net.
0
Robert
12/14/2016 1:01:01 AM
On Saturday, December 10, 2016 at 11:56:58 AM UTC-6, Ivan Godard wrote:
> Part of the discussion (in a different thread) about micro- vs=20
> macro-code entails definitions. Macro-code appears unambiguous: it's=20
> what's in the load module. And micro-code must be any representation of=
=20
> the program different from the macro-code that appears between=20
> macro-code and FU. Right?
>=20
> Well...
>=20
> Mill load modules need not contain program code in executable form.=20
> Instead they contain genAsm, an abstract member-independent=20
> representation that is roughly similar to compiler intermediate form.=20
> They may, but need not, contain the cached result of specializing that=20
> code to the conAsm of one or more particular target members. Is genAsm=20
> macro-code? Is conAsm? Should we say that macro-code is what appear in=20
> memory, after any load-time specialization or other manipulation? But=20
> then what about trace caches?
>=20
> Surely micro-code is easier to define: it is the final representation of=
=20
> code before execution, if that is different from macro-code. But take=20
> the Mill again: our code is very wide and so we see a ton of one=20
> instruction loops. The decoder turns our macro-code into signal sets in=
=20
> the usual way, but the hardware notices that it is in a one-instruction=
=20
> loop and saves the sigsets and reissues them as a power-saving measure,=
=20
> rather than re-cracking the instruction every cycle. Is a saved sigset=20
> microcode? The content *is* rather similar to horizontal micro- ... And=
=20
> there is nothing to stop us handling two- or three- instruction loops=20
> similarly.
>=20
> Justice Potter Stewart's rule applies :-)

MHO:
Given that the term "microcode" arose before pipe-lining and instruction ca=
ches a more appropriate term is needed such as: "dynamic microcode".
The microcode term does not need to imply a microcode store, so logic gener=
ated microcode is compatible in meaning.  Given that the binary in the inst=
ruction cache and the binary being sent to each pipe are all different, the=
n "dynamic" has the right connotation.
0
jim
12/14/2016 4:05:30 AM
On Wednesday, December 14, 2016 at 8:55:04 PM UTC+2, robert...@yahoo.com wrote:
> On Tue, 13 Dec 2016 18:39:35 +0000 (UTC), John Levine <johnl@iecc.com>
> wrote:
> 
> >>> Perhaps you could say there's the published instruction set, the one
> >>> that you promise will still work on the next version of the system so
> >>> you don't have to recompile, and anything below that is microcode if
> >>> that's what you want to call it.
> >>
> >>It is a _lot_ closer to modern Android which has an abstract register 
> >>machine VM and does AOT compilation upon first load, including first 
> >>load after an OS update so that any OS update can also contain 
> >>improvements to the AOT compiler.
> >
> >That's pretty much the same model that Java and .net use -- ship a
> >virtual binary which you then translate to native code either all at
> >once or incrementally as code paths are executed.  I know that .net
> >can do either batch or incremental compilation, haven't kept track
> >of all the JVMs enough to know which does what.
> 
> 
> FWIW, many JVMs can also do direct interpretation (as opposed to any
> sort of compilation), although the idea would be to do that only in
> limited circumstances.  That's not really an option for .net.

https://en.wikipedia.org/wiki/.NET_Micro_Framework
0
already5chosen
12/14/2016 7:41:21 PM
jim.brakefield@ieee.org writes:
>The microcode term does not need to imply a microcode store, so logic gener=
>ated microcode is compatible in meaning.

On first thought, I tended to agree with this.

On second thought: This waters the term down so much that it has no
meaning at all.  Every CPU decodes the architectural instruction into
some kinds of signals that go to the functional unit, so every CPU
would be microcoded, even a classical RISC, or the 6502.  And the more
I think about it, the more it seems to me that the microcode store is
the criterion for differentiating between a microcoded processor and a
hard-coded one.

However, the term micro-instruction is appropriate even when the
signals are generated by logic rather than a microcode store.  E.g.,
for the ARM load-multiple instruction typically some hard-coded logic
generates the micro-instruction versions of the individual loads; on
the 21264 some hard-coded logic generates two microinstructions for
CMOV.

- anton
-- 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
0
anton
12/15/2016 1:32:34 PM
On Thursday, December 15, 2016 at 7:57:14 AM UTC-6, Anton Ertl wrote:
> jim.brakefield@ieee.org writes:
> >The microcode term does not need to imply a microcode store, so logic gener=
> >ated microcode is compatible in meaning.
> 
> On first thought, I tended to agree with this.
> 
> On second thought: This waters the term down so much that it has no
> meaning at all.  Every CPU decodes the architectural instruction into
> some kinds of signals that go to the functional unit, so every CPU
> would be microcoded, even a classical RISC, or the 6502.  And the more
> I think about it, the more it seems to me that the microcode store is
> the criterion for differentiating between a microcoded processor and a
> hard-coded one.
> 
> However, the term micro-instruction is appropriate even when the
> signals are generated by logic rather than a microcode store.  E.g.,
> for the ARM load-multiple instruction typically some hard-coded logic
> generates the micro-instruction versions of the individual loads; on
> the 21264 some hard-coded logic generates two microinstructions for
> CMOV.
> 
> - anton
> -- 
> M. Anton Ertl                    Some things have to be seen to be believed
> anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
> http://www.complang.tuwien.ac.at/anton/home.html

]>>The microcode term does not need to imply a microcode store, so logic ]>>generated microcode is compatible in meaning.

]> On first thought, I tended to agree with this.
]> On second thought: This waters the term down so much that it has no
]> meaning at all.

Currently experimenting with VHDL & FPGAs.  Used to do a formal data path with control signals.  With a formal data path one has microcode IMHO.
Now infer the data path and have the tools generate the control signals and data muxes.  Results are just as good.

So, without formal data paths, there is no formal microcode.
BTW in some cases the tools will infer ROM(s) for the control signals.
0
jim
12/15/2016 4:51:02 PM
Ivan Godard wrote:
> 
> Some macroprogrammed computers have two instruction pointers :-)

Just continuing on the multiple instruction pointer topic...

About 10+ years ago there was a flurry of research on hardware threads
with various names: nano-threads, run-ahead threads, helper threads,
scout threads. Threads had multiple instruction pointers but also
each had a private set of registers.

The term "strand" was coined to refer to multiple IP's when
they operate on a common set of registers, though some designs
have both private and common registers.

A Superstrand Architecture, 1997
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.4234

In that terminology, the Mill is a 2-strand processor,
the Belt being like a common register set.

Multiple IP's also show up in "Decoupled Architectures"
(which also uses the term "strand") which also tries to
avoid memory latency without OoO complexity.
"decoupled architectures divide the memory access and memory consuming
instructions into separate instruction streams called strands"
"this parallelism is extracted by the compiler rather than
in hardware, leading to significantly simpler hardware"

https://courses.cs.washington.edu/courses/cse590g/04sp/Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf

There are 2 recent [paywalled] papers on this, but I have only read the first.

[paywalled]
Decoupled Architectures as a Low-Complexity Alternative
to Out-of-order Execution, 2011
http://dl.acm.org/citation.cfm?id=2121423

"OUTRIDER enables a single thread of execution to be presented
to the architecture as multiple decoupled instruction streams
that separate memory-accessing and memory-consuming instructions."

Outrider: efficient memory latency tolerance with
decoupled strands, 2011.
http://dl.acm.org/citation.cfm?id=2000079

Now the reason those Decoupled Architecture papers are interesting
is that in looking for the second paper I stumbled upon a reference
to it in the following Intel patent:

Methods and apparatus to compile instructions for a vector
of instruction pointers processor architecture, Intel 2013
https://www.google.com/patents/US9086873
http://patft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=9086873.PN.&OS=PN/9086873&RS=PN/9086873

Google doesn't show the drawings but the USPO does have them.
This link may get the full PDF with images:
http://pimg-fpiw.uspto.gov/fdd/73/868/090/0.pdf
or click on "images" at the top and then the button "Full Pages"
and you get a PDF with the complete patent.

The patent says in part:
"In contrast, strands (which are sometimes referred to as micro-threads)
are not implemented at the OS level. However, strands have a common register
file, and communicate with each other via the common register file.
Strands are created quickly (e.g., a single processor cycle), and typically
last for a short period of time (e.g., ten processor cycles, one hundred
processor cycles, etc.). Examples disclosed herein apply to strands and,
more particularly, to how the strands are compiled by the compiler
for use with the VIP processor architecture."
....
"The target processor includes a number of strand processing units
that each can hold a strand."

Eric

0
EricP
12/18/2016 6:43:54 PM
On 12/18/2016 10:43 AM, EricP wrote:
> Ivan Godard wrote:
>>
>> Some macroprogrammed computers have two instruction pointers :-)
>
> Just continuing on the multiple instruction pointer topic...
>
> About 10+ years ago there was a flurry of research on hardware
> threads with various names: nano-threads, run-ahead threads, helper
> threads, scout threads. Threads had multiple instruction pointers but
> also each had a private set of registers.
>
> The term "strand" was coined to refer to multiple IP's when they
> operate on a common set of registers, though some designs have both
> private and common registers.
>
> A Superstrand Architecture, 1997
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.4234
>
> In that terminology, the Mill is a 2-strand processor, the Belt being
> like a common register set.
>
> Multiple IP's also show up in "Decoupled Architectures" (which also
> uses the term "strand") which also tries to avoid memory latency
> without OoO complexity. "decoupled architectures divide the memory
> access and memory consuming instructions into separate instruction
> streams called strands" "this parallelism is extracted by the
> compiler rather than in hardware, leading to significantly simpler
> hardware"
>
> https://courses.cs.washington.edu/courses/cse590g/04sp/Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf
>
>
>
> There are 2 recent [paywalled] papers on this, but I have only read
> the first.
>
> [paywalled] Decoupled Architectures as a Low-Complexity Alternative
> to Out-of-order Execution, 2011
> http://dl.acm.org/citation.cfm?id=2121423
>
> "OUTRIDER enables a single thread of execution to be presented to the
> architecture as multiple decoupled instruction streams that separate
> memory-accessing and memory-consuming instructions."
>
> Outrider: efficient memory latency tolerance with decoupled strands,
> 2011. http://dl.acm.org/citation.cfm?id=2000079
>
> Now the reason those Decoupled Architecture papers are interesting is
> that in looking for the second paper I stumbled upon a reference to
> it in the following Intel patent:
>
> Methods and apparatus to compile instructions for a vector of
> instruction pointers processor architecture, Intel 2013
> https://www.google.com/patents/US9086873
> http://patft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=9086873.PN.&OS=PN/9086873&RS=PN/9086873
>
>
>
> Google doesn't show the drawings but the USPO does have them. This
> link may get the full PDF with images:
> http://pimg-fpiw.uspto.gov/fdd/73/868/090/0.pdf or click on "images"
> at the top and then the button "Full Pages" and you get a PDF with
> the complete patent.
>
> The patent says in part: "In contrast, strands (which are sometimes
> referred to as micro-threads) are not implemented at the OS level.
> However, strands have a common register file, and communicate with
> each other via the common register file. Strands are created quickly
> (e.g., a single processor cycle), and typically last for a short
> period of time (e.g., ten processor cycles, one hundred processor
> cycles, etc.). Examples disclosed herein apply to strands and, more
> particularly, to how the strands are compiled by the compiler for use
> with the VIP processor architecture." ... "The target processor
> includes a number of strand processing units that each can hold a
> strand."
>
> Eric
>

The original decoupled access-execute design was the Astranautics ZS-1 
(http://web.eecs.umich.edu/~jringenb/Astro_ZS-1.pdf) designed by Jim 
Smith. Only one was built. DAE works well for stream tasks, but a stream 
processor works as well and is simpler. For GP code it proved impossible 
to get the required coordination between the strands while also getting 
far enough ahead that the queues were busy. In current machines, a good 
data fetch predictor does as well as a DAE, and they are not of 
historical interest only.

The strand processors are different from the Mill, in that the program 
counters in a strander advance independently, and communication and 
synchronization between the strands is via queues (in the ZS-1) or 
synchronization primitives (in others). No strander (that I know of) had 
an exposed pipeline, whereas Mill exposes the pipeline. The two Mill 
PCs, while tracking different addresses, are synchronous to each other. 
In a very real sense the Mill has only one strand, which is encoded as 
two half strands.
0
Ivan
12/18/2016 10:35:37 PM
On 12/18/2016 10:43 AM, EricP wrote:

> There are 2 recent [paywalled] papers on this, but I have only read the
> first.
>
> [paywalled]
> Decoupled Architectures as a Low-Complexity Alternative
> to Out-of-order Execution, 2011
> http://dl.acm.org/citation.cfm?id=2121423
>
> "OUTRIDER enables a single thread of execution to be presented
> to the architecture as multiple decoupled instruction streams
> that separate memory-accessing and memory-consuming instructions."
>
> Outrider: efficient memory latency tolerance with
> decoupled strands, 2011.
> http://dl.acm.org/citation.cfm?id=2000079

The original thesis is at: 
https://www.ideals.illinois.edu/bitstream/handle/2142/24372/Crago_Neal.pdf?sequence=1


0
Ivan
12/18/2016 11:00:45 PM
Ivan Godard wrote:
> On 12/18/2016 10:43 AM, EricP wrote:
> 
>> There are 2 recent [paywalled] papers on this, but I have only read the
>> first.
>>
>> [paywalled]
>> Decoupled Architectures as a Low-Complexity Alternative
>> to Out-of-order Execution, 2011
>> http://dl.acm.org/citation.cfm?id=2121423
>>
>> "OUTRIDER enables a single thread of execution to be presented
>> to the architecture as multiple decoupled instruction streams
>> that separate memory-accessing and memory-consuming instructions."
>>
>> Outrider: efficient memory latency tolerance with
>> decoupled strands, 2011.
>> http://dl.acm.org/citation.cfm?id=2000079
> 
> The original thesis is at: 
> https://www.ideals.illinois.edu/bitstream/handle/2142/24372/Crago_Neal.pdf?sequence=1 

Thanks.
Its interesting and appears to work. I like that it handles
pointer chasing code, and he resolves some of the other
issues that occur with multiple IP's like exceptions,
coordinating control flow, and maintaining load-store order.

His benchmarks were hand compiled though so that really
limits the amount of analysis that can be done.
For example, he hard-codes the max number of strands at 4.
It would have been interesting to see how many strands
could be extracted with unlimited strands.

But like so much other research, this does not
appear to go any further.

Eric


0
EricP
12/19/2016 11:21:57 PM
On 12/19/2016 3:21 PM, EricP wrote:
> Ivan Godard wrote:

>> The original thesis is at:
>> https://www.ideals.illinois.edu/bitstream/handle/2142/24372/Crago_Neal.pdf?sequence=1
>
>>
>
> Thanks. Its interesting and appears to work. I like that it handles
> pointer chasing code, and he resolves some of the other issues that
> occur with multiple IP's like exceptions, coordinating control flow,
> and maintaining load-store order.
>
> His benchmarks were hand compiled though so that really limits the
> amount of analysis that can be done. For example, he hard-codes the
> max number of strands at 4. It would have been interesting to see how
> many strands could be extracted with unlimited strands.
>
> But like so much other research, this does not appear to go any
> further.
>
> Eric
>
>

Because it doesn't go further.

Microthreads are just dataflow machines that fire closely-dependent 
instruction sequences rather than individual instructions. DF machines 
expose a *ton* of parallelism with matchbox sizes of the 10K-1M 
microthread scales. The problem is that you can't build a matchbox 
bigger than 0.3k or so, so match must be software in main memory, and no 
one knows how to parallelize that. If you could get bigger then Intel 
would build a bigger issue engine in its OOO iron; the issue engines are 
really just a dataflow matchbox.

Certain "well shaped" application fragments, the kinds we call "data 
parallel", expose lots of parallelism at sizes small enough to fit in a 
matchbox, or with visibility ranges small enough for a compiler to 
recognize. His examples are of that sort; the domain is important, but 
small. However, the domain has already been addressed in other ways: 
prefetch engines, software pipelining, control and data flow prediction, 
stream types, co-routines. There doesn't seem any advantage to his 
approach.

Indeed, most of the comparisons he makes claim absence of speculation as 
an advantage, because he doesn't queue a microthread until it is known 
to be taken. When the compiler can prove (in selected fragments) the 
loop bounds then this works fine - but so does normal vectorization. 
Once a dependent predicate is in the control flow - think strcpy - then 
you must speculate or defer microthread issue to what a single engine 
would do.
0
Ivan
12/20/2016 1:08:35 AM
On Sunday, December 18, 2016 at 4:00:46 PM UTC-7, Ivan Godard wrote:

> The original thesis is at: 
> https://www.ideals.illinois.edu/bitstream/handle/2142/24372/Crago_Neal.pdf?sequence=1

Thank you. Taking a look at it, though, it was specifically intended at 
simplifying the design of a CPU which would still function in a throughput-
oriented environment, such as in database processing, and *not* in making 
something as effective as an out-of-order design in latency-sensitive 
conditions, even though the design was aimed at reducing the impact of the 
latency of external DRAM.

Thus, it seemed to me that this was about further solving what the industry 
views as the "easy" problem, as opposed to the "hard" problem everyone wishes 
they could solve.

John Savard
0
Quadibloc
12/20/2016 5:50:11 AM
Reply: