f



More: ladder inheritance

while I'm bitching and moaning about bad languages: no ladder 
inheritance nor any way to build it. C-coders can stop reading; this 
complaint is not for you.

Notation: "->" means "is derived from".

Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. 
Except when you are all done you want the effect of A'->B'->C'.

This is trivial in SmallTalk and Mary, and doable in languages with 
delegation rather than inheritance. Users of C-family languages don't 
get it.

Example from our codebase in the Mill project: class abstractCache 
models the behavior of a cache-like component, with fields for a line 
size, a replacement behavior, and so on. It is specialized to 
abstractInstructionCache and abstractDataCache (among others) in the 
obvious way.

In the sim we have to instantiate and simulate a bunch of these 
components, and attach fields with information which are required by the 
sim to do its job but are not part of the abstraction, such as the 
simulator queues that hold lines that are in flight going to/from the 
banks, or the statistics gathers. That gives us the classes 
concreteCache, concreteInstructionCache, and concreteDataCache (among 
others).

Other code in the sim sometimes needs a pointer to a cache object. A 
concrete cache object. So the pointer is a concreteCache*, and (say) I 
want to point it at the I$1 of some (simulated) core. So my code should say:
    concreteInstructionCache I1 = ...;
    concreteCache* p = &I1;
except you can't say that in C++, even though you could say:
    abstractCache* q = &(someAbstractInstructionCache);

My complaint is really about inheritance in general: inheritance is a 
mistake. Oh well :-(
0
Ivan
12/5/2016 7:11:55 PM
comp.arch 7611 articles. 0 followers. carchreader (32) is leader. Post Follow

40 Replies
885 Views

Similar Articles

[PageSpeed] 37

On Monday, December 5, 2016 at 9:12:00 PM UTC+2, Ivan Godard wrote:
>
> My complaint is really about inheritance in general: inheritance is a 
> mistake. Oh well :-(
>

Isn't it almost a consensus nowadays?
Googling "inheritance as antipattern" brought 55,000 results. And that's just one possible phrasing.

Anti-inheritance sentiment appears to be one of the major driving forces behind Google's Go language (yes, I know, not as big driving forces as light threading).

Fortunately, you sound as C++ user.
I don't remember a paragraph in C++ standard that say that programmers have to use inheritance :-)
Aggregation works in C++ just as fine as it works in most other languages.
Delegation also sort of works, even if it is 10 times uglier, syntactically, than it should be.

Interfaces are absent, yes, and it sucks.
But, then again, I don't remember a paragraph in C++ standard that say that programmers have to program in C++ :-)






0
already5chosen
12/6/2016 12:06:15 PM
already5chosen@yahoo.com writes:
[C++]
>Interfaces are absent, yes, and it sucks.

Are they not called "abstract classes"?  Yes, abstract classes in C++
are not as restricted as Java interfaces, but they can do everything
that Java interfaces can do.

- anton
-- 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
0
anton
12/6/2016 5:00:55 PM
On Monday, December 5, 2016 at 1:12:00 PM UTC-6, Ivan Godard wrote:
> while I'm bitching and moaning about bad languages: no ladder 
> inheritance nor any way to build it. C-coders can stop reading; this 
> complaint is not for you.
<snip> 
> My complaint is really about inheritance in general: inheritance is a 
> mistake. Oh well :-(

The last simulator I did in C++ had a single module that "performed" all 
of the caches in the machine {L1code L1codeTLB, L1data, L1dataTLB, L2TLB,
L2combined}. During Power-On, each of the instantiated caches filled in 
data and new memory to hold the actualized stored content of each cache.
Then one single C++-routine performed the cache lookup, another a replace-
ment,...

While the code and data caches were set associative (changeable per run)
the L1...TLBs were fully associative. All of this is easy enough to wrap 
up in a single module consisting of but a handful of functions and another 
handful of structures.

Thus, in the case of caches, inheritance is absolutely the wrong model.

Mitch
0
MitchAlsup
12/6/2016 5:54:13 PM
On 12/6/2016 9:54 AM, MitchAlsup wrote:
> On Monday, December 5, 2016 at 1:12:00 PM UTC-6, Ivan Godard wrote:
>> while I'm bitching and moaning about bad languages: no ladder
>> inheritance nor any way to build it. C-coders can stop reading; this
>> complaint is not for you.
> <snip>
>> My complaint is really about inheritance in general: inheritance is a
>> mistake. Oh well :-(
>
> The last simulator I did in C++ had a single module that "performed" all
> of the caches in the machine {L1code L1codeTLB, L1data, L1dataTLB, L2TLB,
> L2combined}. During Power-On, each of the instantiated caches filled in
> data and new memory to hold the actualized stored content of each cache.
> Then one single C++-routine performed the cache lookup, another a replace-
> ment,...
>
> While the code and data caches were set associative (changeable per run)
> the L1...TLBs were fully associative. All of this is easy enough to wrap
> up in a single module consisting of but a handful of functions and another
> handful of structures.
>
> Thus, in the case of caches, inheritance is absolutely the wrong model.
>
> Mitch
>

You aren't addressing the problem I was describing.

The Mill family is specification driven. We define a parts-box of 
components, often several instances of the same kind of component, and a 
spec glues them together. Thus the "xtal" component may be instantiated 
several times with different frequencies. The person doing the 
configuring picks one and sticks it in the spec. Software then grinds 
the spec, checking for inconsistencies and generating other software 
such as the assembler. Dropping a different instance of xtal in the spec 
may get you diagnostics such as "desired frequency unavailable at PLL 
output".

I'm just using caches as a familiar example here. It is convenient to 
factor out those things that are common in all caches into a base class, 
and then specialize to subclasses for those things that are not. Thus 
all classes have a replacement policy, but only data and not instruction 
caches have write ports. The factoring is not essential - you can 
flatten everything into the union of all kinds of caches, as you appear 
to have done - but factoring is just good programming practice.

However, some of the uses of the spec need use-specific additional data 
on each component. The sim is an example of this. Nearly all the 
components, when modeled in the sim, require sim-specific data and/or 
functionality that is not present in the hardware, nor in the spec of 
that hardware. For example, nearly all can be probed, and changed, from 
the sim's UI. The classes that the sim uses to model caches need to 
support this UI access. Yet the UI access info is irrelevant to the 
other uses of the spec.

When the components are themselves part of some class hierarchy the 
added fields and functions will appear at all levels of that hierarchy. 
My complaint is that subclassing to add the sim-specific stuff loses the 
inheritance hierarchy of the original.

You may feel that all the class arm-waving is pointless, and perhaps it 
is when modelling a single and quite fixed piece of hardware. That's not 
our problem space though: we need to be able to compose arbitrary specs, 
both for our own configuration and experimentation needs and also for 
our eventual customers for full- or semi-custom Mills.
0
Ivan
12/6/2016 10:48:59 PM
On Tuesday, December 6, 2016 at 4:49:01 PM UTC-6, Ivan Godard wrote:
> On 12/6/2016 9:54 AM, MitchAlsup wrote:
> > On Monday, December 5, 2016 at 1:12:00 PM UTC-6, Ivan Godard wrote:
> >> while I'm bitching and moaning about bad languages: no ladder
> >> inheritance nor any way to build it. C-coders can stop reading; this
> >> complaint is not for you.
> > <snip>
> >> My complaint is really about inheritance in general: inheritance is a
> >> mistake. Oh well :-(
> >
> > The last simulator I did in C++ had a single module that "performed" all
> > of the caches in the machine {L1code L1codeTLB, L1data, L1dataTLB, L2TLB,
> > L2combined}. During Power-On, each of the instantiated caches filled in
> > data and new memory to hold the actualized stored content of each cache.
> > Then one single C++-routine performed the cache lookup, another a replace-
> > ment,...
> >
> > While the code and data caches were set associative (changeable per run)
> > the L1...TLBs were fully associative. All of this is easy enough to wrap
> > up in a single module consisting of but a handful of functions and another
> > handful of structures.
> >
> > Thus, in the case of caches, inheritance is absolutely the wrong model.
> >
> > Mitch
> >
> 
> You aren't addressing the problem I was describing.

No, I was addressing the problem you should have been.

> The Mill family is specification driven. We define a parts-box of 
> components, often several instances of the same kind of component, and a 
> spec glues them together. Thus the "xtal" component may be instantiated 
> several times with different frequencies. The person doing the 
> configuring picks one and sticks it in the spec. Software then grinds 
> the spec, checking for inconsistencies and generating other software 
> such as the assembler. Dropping a different instance of xtal in the spec 
> may get you diagnostics such as "desired frequency unavailable at PLL 
> output".
> 
> I'm just using caches as a familiar example here. It is convenient to 
> factor out those things that are common in all caches into a base class, 
> and then specialize to subclasses for those things that are not. Thus 
> all classes have a replacement policy, but only data and not instruction 
> caches have write ports.

Then HOW (pray tell) do those caches ever get data put into them? The 
replacement logic has exclusive use of the write port for instruction 
caches, while both the store port and the replacement port have access 
to the write port of the data caches.

But in all cases, there has to be a write port! otherwise the cache 
can't store anything! I think you are looking at the problem space
from a slightly different PoV as a hardware centric person such as 
myself. From my PoV, there is greater similarity than differences,
and the differences should be expressed outside the abstraction 
level of the cache itself--it should be expressed at the pipeline
leading towards the write port.

>                          The factoring is not essential - you can 
> flatten everything into the union of all kinds of caches, as you appear 
> to have done - but factoring is just good programming practice.

You only have to write the code once.

> However, some of the uses of the spec need use-specific additional data 
> on each component. The sim is an example of this. Nearly all the 
> components, when modeled in the sim, require sim-specific data and/or 
> functionality that is not present in the hardware, nor in the spec of 
> that hardware. For example, nearly all can be probed, and changed, from 
> the sim's UI. The classes that the sim uses to model caches need to 
> support this UI access. Yet the UI access info is irrelevant to the 
> other uses of the spec.

This is no different than the use of functional versus synthesizable
RTL. Functional has stuff that synthesizable does not (but would if it
did not have such high area costs.)

> When the components are themselves part of some class hierarchy the 
> added fields and functions will appear at all levels of that hierarchy. 
> My complaint is that subclassing to add the sim-specific stuff loses the 
> inheritance hierarchy of the original.

This almost sounds like you are trying to do with inheritance that which 
can be more easily done with # IFDEFs.

> You may feel that all the class arm-waving is pointless, and perhaps it 
> is when modelling a single and quite fixed piece of hardware. That's not 
> our problem space though: we need to be able to compose arbitrary specs, 
> both for our own configuration and experimentation needs and also for 
> our eventual customers for full- or semi-custom Mills.

0
MitchAlsup
12/7/2016 4:35:05 AM
On 12/6/2016 8:35 PM, MitchAlsup wrote:
> On Tuesday, December 6, 2016 at 4:49:01 PM UTC-6, Ivan Godard wrote:
>> On 12/6/2016 9:54 AM, MitchAlsup wrote:
>>> On Monday, December 5, 2016 at 1:12:00 PM UTC-6, Ivan Godard wrote:
>>>> while I'm bitching and moaning about bad languages: no ladder
>>>> inheritance nor any way to build it. C-coders can stop reading; this
>>>> complaint is not for you.
>>> <snip>
>>>> My complaint is really about inheritance in general: inheritance is a
>>>> mistake. Oh well :-(
>>>
>>> The last simulator I did in C++ had a single module that "performed" all
>>> of the caches in the machine {L1code L1codeTLB, L1data, L1dataTLB, L2TLB,
>>> L2combined}. During Power-On, each of the instantiated caches filled in
>>> data and new memory to hold the actualized stored content of each cache.
>>> Then one single C++-routine performed the cache lookup, another a replace-
>>> ment,...
>>>
>>> While the code and data caches were set associative (changeable per run)
>>> the L1...TLBs were fully associative. All of this is easy enough to wrap
>>> up in a single module consisting of but a handful of functions and another
>>> handful of structures.
>>>
>>> Thus, in the case of caches, inheritance is absolutely the wrong model.
>>>
>>> Mitch
>>>
>>
>> You aren't addressing the problem I was describing.
>
> No, I was addressing the problem you should have been.
>
>> The Mill family is specification driven. We define a parts-box of
>> components, often several instances of the same kind of component, and a
>> spec glues them together. Thus the "xtal" component may be instantiated
>> several times with different frequencies. The person doing the
>> configuring picks one and sticks it in the spec. Software then grinds
>> the spec, checking for inconsistencies and generating other software
>> such as the assembler. Dropping a different instance of xtal in the spec
>> may get you diagnostics such as "desired frequency unavailable at PLL
>> output".
>>
>> I'm just using caches as a familiar example here. It is convenient to
>> factor out those things that are common in all caches into a base class,
>> and then specialize to subclasses for those things that are not. Thus
>> all classes have a replacement policy, but only data and not instruction
>> caches have write ports.
>
> Then HOW (pray tell) do those caches ever get data put into them? The
> replacement logic has exclusive use of the write port for instruction
> caches, while both the store port and the replacement port have access
> to the write port of the data caches.

Our caches have a quite different structure than I think you have in 
mind. Recall that we do not have write buffers, and do have per-byte 
valid bits. Both data and instruction sides have whole-line-segment 
hoist paths, used for fetch in the icache and hoisting in the data side. 
Both may have a set of victim buffers, but only the data VBs have the 
shift-and-merge function that is our equivalent of a write-consolidation 
buffers. Stores in a Mill are operand-sized and go direct from the L/S 
unit (that does the addressing) to the TLDC.

As a result a store commits immediately, and does not need to wait for 
the line if it's not already in cache. That operand-width data path to 
the D$1 doesn't exist on the I$ side, nor does all the associated spec 
arguments necessary to configure it. Nor does the operand-wide data path 
from the VBs to the retire stations, with its own shifter to isolate 
data loads. Conversely, the I$1 has a line-wide fetch path to the i$0 
microcaches and the decoder instruction shifters that the data side 
doesn't have.

All these differences are reflected in the the subclassing in the spec 
classes, and the subclassing in the sim implementation classes. Whence 
my complaint.

> But in all cases, there has to be a write port! otherwise the cache
> can't store anything! I think you are looking at the problem space
> from a slightly different PoV as a hardware centric person such as
> myself. From my PoV, there is greater similarity than differences,
> and the differences should be expressed outside the abstraction
> level of the cache itself--it should be expressed at the pipeline
> leading towards the write port.

There are major pieces of hardware that exist in only one or the other. 
If your POV calls only the actual data byte flops the "cache" then what 
is your word for the whole subsystem and its API interfaces to other 
subsystems, which is what I call a "cache"? For just the storage part of 
the subsystem we use the word "bank", of which a cache subsystem can be 
configured with several.

>>                          The factoring is not essential - you can
>> flatten everything into the union of all kinds of caches, as you appear
>> to have done - but factoring is just good programming practice.
>
> You only have to write the code once.


>> However, some of the uses of the spec need use-specific additional data
>> on each component. The sim is an example of this. Nearly all the
>> components, when modeled in the sim, require sim-specific data and/or
>> functionality that is not present in the hardware, nor in the spec of
>> that hardware. For example, nearly all can be probed, and changed, from
>> the sim's UI. The classes that the sim uses to model caches need to
>> support this UI access. Yet the UI access info is irrelevant to the
>> other uses of the spec.
>
> This is no different than the use of functional versus synthesizable
> RTL. Functional has stuff that synthesizable does not (but would if it
> did not have such high area costs.)
>

I'm not a hardware guy. The Mill is designed, and defined, as a software 
system. The sim behavior, not the RTL, is the definition. The RTL is 
(largely) generated from the spec.

>> When the components are themselves part of some class hierarchy the
>> added fields and functions will appear at all levels of that hierarchy.
>> My complaint is that subclassing to add the sim-specific stuff loses the
>> inheritance hierarchy of the original.
>
> This almost sounds like you are trying to do with inheritance that which
> can be more easily done with # IFDEFs.

I suppose; that's sort of what you would have to do in C. Fortunately 
someone has already put inheritance into languages, so I don't have to.

>> You may feel that all the class arm-waving is pointless, and perhaps it
>> is when modelling a single and quite fixed piece of hardware. That's not
>> our problem space though: we need to be able to compose arbitrary specs,
>> both for our own configuration and experimentation needs and also for
>> our eventual customers for full- or semi-custom Mills.
>

0
Ivan
12/7/2016 5:58:21 AM
Ivan Godard <ivan@millcomputing.com> writes:

> while I'm bitching and moaning about bad languages: no ladder inheritance
> nor any way to build it. C-coders can stop reading; this complaint is not
> for you.
>
> Notation: "->" means "is derived from".
>
> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. Except
> when you are all done you want the effect of A'->B'->C'.
>
> This is trivial in SmallTalk and Mary, and doable in languages with
> delegation rather than inheritance. Users of C-family languages don't get
> it.
>
> Example from our codebase in the Mill project: class abstractCache models
> the behavior of a cache-like component, with fields for a line size, a
> replacement behavior, and so on. It is specialized to
> abstractInstructionCache and abstractDataCache (among others) in the
> obvious way.
>
> In the sim we have to instantiate and simulate a bunch of these components,
> and attach fields with information which are required by the sim to do its
> job but are not part of the abstraction, such as the simulator queues that
> hold lines that are in flight going to/from the banks, or the statistics
> gathers. That gives us the classes concreteCache, concreteInstructionCache,
> and concreteDataCache (among others).
>
> Other code in the sim sometimes needs a pointer to a cache object. A
> concrete cache object. So the pointer is a concreteCache*, and (say) I want
> to point it at the I$1 of some (simulated) core. So my code should say:
>    concreteInstructionCache I1 = ...;
>    concreteCache* p = &I1;
> except you can't say that in C++, even though you could say:
>    abstractCache* q = &(someAbstractInstructionCache);

I'm not sure why you need to cast to concreteCache and not abstractCache.

But you can have

class A {};
class B: public virtual A {};
class C: public virtual B {};

class Ap: public virtual A {};
class Bp: public Ap, public virtual B {};
class Cp: public Bp, public virtual C {};

It seems that it gives you what you want.  There are some situations where
you can avoid multiple virtual inheritance with templates like:

class A {};
class B: public virtual A {};
class C: public virtual B {};

template <typename Base> class Apt: public Base {};
template <typename Base> class Bpt: public Apt<Base> {};
template <typename Base> class Cpt: public Bpt<Base> {};

typedef Apt<A> Ap;
typedef Bpt<B> Bp;
typedef Cpt<C> Cp;

but here you concreteCache will be something like
concreteCache<abstractInstructionCache>, I'm not sure it will provide what
you want without templatizing more code than needed.

Yours,

-- 
Jean-Marc
0
Jean
12/7/2016 7:10:12 PM
On 12/7/2016 11:10 AM, Jean-Marc Bourguet wrote:
> Ivan Godard <ivan@millcomputing.com> writes:
>
>> while I'm bitching and moaning about bad languages: no ladder inheritance
>> nor any way to build it. C-coders can stop reading; this complaint is not
>> for you.
>>
>> Notation: "->" means "is derived from".
>>
>> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. Except
>> when you are all done you want the effect of A'->B'->C'.
>>
>> This is trivial in SmallTalk and Mary, and doable in languages with
>> delegation rather than inheritance. Users of C-family languages don't get
>> it.
>>
>> Example from our codebase in the Mill project: class abstractCache models
>> the behavior of a cache-like component, with fields for a line size, a
>> replacement behavior, and so on. It is specialized to
>> abstractInstructionCache and abstractDataCache (among others) in the
>> obvious way.
>>
>> In the sim we have to instantiate and simulate a bunch of these components,
>> and attach fields with information which are required by the sim to do its
>> job but are not part of the abstraction, such as the simulator queues that
>> hold lines that are in flight going to/from the banks, or the statistics
>> gathers. That gives us the classes concreteCache, concreteInstructionCache,
>> and concreteDataCache (among others).
>>
>> Other code in the sim sometimes needs a pointer to a cache object. A
>> concrete cache object. So the pointer is a concreteCache*, and (say) I want
>> to point it at the I$1 of some (simulated) core. So my code should say:
>>    concreteInstructionCache I1 = ...;
>>    concreteCache* p = &I1;
>> except you can't say that in C++, even though you could say:
>>    abstractCache* q = &(someAbstractInstructionCache);
>
> I'm not sure why you need to cast to concreteCache and not abstractCache.
>
> But you can have
>
> class A {};
> class B: public virtual A {};
> class C: public virtual B {};
>
> class Ap: public virtual A {};
> class Bp: public Ap, public virtual B {};
> class Cp: public Bp, public virtual C {};
>
> It seems that it gives you what you want.

Yes, virtual base classes would mostly work. There's some issues because 
the virtual base is treated as a direct base rather than the indirect 
base of normal inheritance, which can lead to surprises in name 
collisions. Still, it's a good solution, albeit requiring much C++-foo. 
I wonder how many of our readers have ever written a virtual base in 
their code. I have, but then my C++ has a rep for baffling newcomers.

There are some situations where
> you can avoid multiple virtual inheritance with templates like:
>
> class A {};
> class B: public virtual A {};
> class C: public virtual B {};
>
> template <typename Base> class Apt: public Base {};
> template <typename Base> class Bpt: public Apt<Base> {};
> template <typename Base> class Cpt: public Bpt<Base> {};
>
> typedef Apt<A> Ap;
> typedef Bpt<B> Bp;
> typedef Cpt<C> Cp;
>
> but here you concreteCache will be something like
> concreteCache<abstractInstructionCache>, I'm not sure it will provide what
> you want without templatizing more code than needed.

The "curiously recursive template" model might be applicable, but I'm 
too busy to look :-)

0
Ivan
12/7/2016 7:37:48 PM
On Wednesday, December 7, 2016 at 10:37:49 PM UTC+3, Ivan Godard wrote:
> On 12/7/2016 11:10 AM, Jean-Marc Bourguet wrote:
> > Ivan Godard <ivan@millcomputing.com> writes:
> >
> >> while I'm bitching and moaning about bad languages: no ladder inherita=
nce
> >> nor any way to build it. C-coders can stop reading; this complaint is =
not
> >> for you.
> >>
> >> Notation: "->" means "is derived from".
> >>
> >> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. Ex=
cept
> >> when you are all done you want the effect of A'->B'->C'.
> >>
> >> This is trivial in SmallTalk and Mary, and doable in languages with
> >> delegation rather than inheritance. Users of C-family languages don't =
get
> >> it.
> >>
> >> Example from our codebase in the Mill project: class abstractCache mod=
els
> >> the behavior of a cache-like component, with fields for a line size, a
> >> replacement behavior, and so on. It is specialized to
> >> abstractInstructionCache and abstractDataCache (among others) in the
> >> obvious way.
> >>
> >> In the sim we have to instantiate and simulate a bunch of these compon=
ents,
> >> and attach fields with information which are required by the sim to do=
 its
> >> job but are not part of the abstraction, such as the simulator queues =
that
> >> hold lines that are in flight going to/from the banks, or the statisti=
cs
> >> gathers. That gives us the classes concreteCache, concreteInstructionC=
ache,
> >> and concreteDataCache (among others).
> >>
> >> Other code in the sim sometimes needs a pointer to a cache object. A
> >> concrete cache object. So the pointer is a concreteCache*, and (say) I=
 want
> >> to point it at the I$1 of some (simulated) core. So my code should say=
:
> >>    concreteInstructionCache I1 =3D ...;
> >>    concreteCache* p =3D &I1;
> >> except you can't say that in C++, even though you could say:
> >>    abstractCache* q =3D &(someAbstractInstructionCache);
> >
> > I'm not sure why you need to cast to concreteCache and not abstractCach=
e.
> >
> > But you can have
> >
> > class A {};
> > class B: public virtual A {};
> > class C: public virtual B {};
> >
> > class Ap: public virtual A {};
> > class Bp: public Ap, public virtual B {};
> > class Cp: public Bp, public virtual C {};
> >
> > It seems that it gives you what you want.
>=20
> Yes, virtual base classes would mostly work. There's some issues because=
=20
> the virtual base is treated as a direct base rather than the indirect=20
> base of normal inheritance, which can lead to surprises in name=20
> collisions. Still, it's a good solution, albeit requiring much C++-foo.=
=20
> I wonder how many of our readers have ever written a virtual base in=20
> their code. I have, but then my C++ has a rep for baffling newcomers.

I have, and I believe in 99% cases virtual base classes should be the defau=
lt style. They are appropriate any time you're using inheritance for interf=
ace (which is what it should be used for) not for implementation (for which=
 inclusion as a member is usually more appropriate).

Inheritance in Dylan and CLOS is effectively virtual inheritance.

In the "AlcheMo" Java to native compiler (via C) I used C++ virtual inherit=
ance to implement Java interfaces.

http://www.innaworks.com/alchemo-java-me-j2me-to-brew-android-iphone-flash-=
windows-mobile-cross-compiler.html
0
Bruce
12/7/2016 7:44:29 PM
Ivan Godard <ivan@millcomputing.com> writes:

> On 12/7/2016 11:10 AM, Jean-Marc Bourguet wrote:
>> Ivan Godard <ivan@millcomputing.com> writes:
>>
>>> while I'm bitching and moaning about bad languages: no ladder inheritance
>>> nor any way to build it. C-coders can stop reading; this complaint is not
>>> for you.
>>>
>>> Notation: "->" means "is derived from".
>>>
>>> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. Except
>>> when you are all done you want the effect of A'->B'->C'.
>>>
>>> This is trivial in SmallTalk and Mary, and doable in languages with
>>> delegation rather than inheritance. Users of C-family languages don't get
>>> it.
>>>
>>> Example from our codebase in the Mill project: class abstractCache models
>>> the behavior of a cache-like component, with fields for a line size, a
>>> replacement behavior, and so on. It is specialized to
>>> abstractInstructionCache and abstractDataCache (among others) in the
>>> obvious way.
>>>
>>> In the sim we have to instantiate and simulate a bunch of these components,
>>> and attach fields with information which are required by the sim to do its
>>> job but are not part of the abstraction, such as the simulator queues that
>>> hold lines that are in flight going to/from the banks, or the statistics
>>> gathers. That gives us the classes concreteCache, concreteInstructionCache,
>>> and concreteDataCache (among others).
>>>
>>> Other code in the sim sometimes needs a pointer to a cache object. A
>>> concrete cache object. So the pointer is a concreteCache*, and (say) I want
>>> to point it at the I$1 of some (simulated) core. So my code should say:
>>>    concreteInstructionCache I1 = ...;
>>>    concreteCache* p = &I1;
>>> except you can't say that in C++, even though you could say:
>>>    abstractCache* q = &(someAbstractInstructionCache);
>>
>> I'm not sure why you need to cast to concreteCache and not abstractCache.
>>
>> But you can have
>>
>> class A {};
>> class B: public virtual A {};
>> class C: public virtual B {};
>>
>> class Ap: public virtual A {};
>> class Bp: public Ap, public virtual B {};
>> class Cp: public Bp, public virtual C {};
>>
>> It seems that it gives you what you want.
>
> Yes, virtual base classes would mostly work. There's some issues because
> the virtual base is treated as a direct base rather than the indirect base
> of normal inheritance, which can lead to surprises in name
> collisions. Still, it's a good solution, albeit requiring much C++-foo. I
> wonder how many of our readers have ever written a virtual base in their
> code. I have, but then my C++ has a rep for baffling newcomers.

virtual bases without data member are the closest thing I know in C++ of
Java interface (some, like Ada 2005 designers IIRC, could see it as Java
interface providing most of the usefull cases of multiple and virtual
inheritance).

I'm not sure what surprises in name collision you are thinking about.  If
I'm not mistaken, virtual inheritance introduces less ambiguity than
non-virtual one.

Yours,

-- 
Jean-Marc
0
Jean
12/7/2016 7:54:44 PM
On 12/7/2016 11:44 AM, Bruce Hoult wrote:
> On Wednesday, December 7, 2016 at 10:37:49 PM UTC+3, Ivan Godard wrote:
>> On 12/7/2016 11:10 AM, Jean-Marc Bourguet wrote:
>>> Ivan Godard <ivan@millcomputing.com> writes:
>>>
>>>> while I'm bitching and moaning about bad languages: no ladder inheritance
>>>> nor any way to build it. C-coders can stop reading; this complaint is not
>>>> for you.
>>>>
>>>> Notation: "->" means "is derived from".
>>>>
>>>> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. Except
>>>> when you are all done you want the effect of A'->B'->C'.
>>>>
>>>> This is trivial in SmallTalk and Mary, and doable in languages with
>>>> delegation rather than inheritance. Users of C-family languages don't get
>>>> it.
>>>>
>>>> Example from our codebase in the Mill project: class abstractCache models
>>>> the behavior of a cache-like component, with fields for a line size, a
>>>> replacement behavior, and so on. It is specialized to
>>>> abstractInstructionCache and abstractDataCache (among others) in the
>>>> obvious way.
>>>>
>>>> In the sim we have to instantiate and simulate a bunch of these components,
>>>> and attach fields with information which are required by the sim to do its
>>>> job but are not part of the abstraction, such as the simulator queues that
>>>> hold lines that are in flight going to/from the banks, or the statistics
>>>> gathers. That gives us the classes concreteCache, concreteInstructionCache,
>>>> and concreteDataCache (among others).
>>>>
>>>> Other code in the sim sometimes needs a pointer to a cache object. A
>>>> concrete cache object. So the pointer is a concreteCache*, and (say) I want
>>>> to point it at the I$1 of some (simulated) core. So my code should say:
>>>>    concreteInstructionCache I1 = ...;
>>>>    concreteCache* p = &I1;
>>>> except you can't say that in C++, even though you could say:
>>>>    abstractCache* q = &(someAbstractInstructionCache);
>>>
>>> I'm not sure why you need to cast to concreteCache and not abstractCache.
>>>
>>> But you can have
>>>
>>> class A {};
>>> class B: public virtual A {};
>>> class C: public virtual B {};
>>>
>>> class Ap: public virtual A {};
>>> class Bp: public Ap, public virtual B {};
>>> class Cp: public Bp, public virtual C {};
>>>
>>> It seems that it gives you what you want.
>>
>> Yes, virtual base classes would mostly work. There's some issues because
>> the virtual base is treated as a direct base rather than the indirect
>> base of normal inheritance, which can lead to surprises in name
>> collisions. Still, it's a good solution, albeit requiring much C++-foo.
>> I wonder how many of our readers have ever written a virtual base in
>> their code. I have, but then my C++ has a rep for baffling newcomers.
>
> I have, and I believe in 99% cases virtual base classes should be the default style. They are appropriate any time you're using inheritance for interface (which is what it should be used for) not for implementation (for which inclusion as a member is usually more appropriate).
>
> Inheritance in Dylan and CLOS is effectively virtual inheritance.
>
> In the "AlcheMo" Java to native compiler (via C) I used C++ virtual inheritance to implement Java interfaces.
>
> http://www.innaworks.com/alchemo-java-me-j2me-to-brew-android-iphone-flash-windows-mobile-cross-compiler.html
>

A problem with inclusion is that it makes you dependent on the public 
interface. There are reasons for "protected".
0
Ivan
12/7/2016 8:17:03 PM
On Wednesday, December 7, 2016 at 11:17:05 PM UTC+3, Ivan Godard wrote:
> On 12/7/2016 11:44 AM, Bruce Hoult wrote:
> > On Wednesday, December 7, 2016 at 10:37:49 PM UTC+3, Ivan Godard wrote:
> >> On 12/7/2016 11:10 AM, Jean-Marc Bourguet wrote:
> >>> Ivan Godard <ivan@millcomputing.com> writes:
> >>>
> >>>> while I'm bitching and moaning about bad languages: no ladder inheri=
tance
> >>>> nor any way to build it. C-coders can stop reading; this complaint i=
s not
> >>>> for you.
> >>>>
> >>>> Notation: "->" means "is derived from".
> >>>>
> >>>> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. =
Except
> >>>> when you are all done you want the effect of A'->B'->C'.
> >>>>
> >>>> This is trivial in SmallTalk and Mary, and doable in languages with
> >>>> delegation rather than inheritance. Users of C-family languages don'=
t get
> >>>> it.
> >>>>
> >>>> Example from our codebase in the Mill project: class abstractCache m=
odels
> >>>> the behavior of a cache-like component, with fields for a line size,=
 a
> >>>> replacement behavior, and so on. It is specialized to
> >>>> abstractInstructionCache and abstractDataCache (among others) in the
> >>>> obvious way.
> >>>>
> >>>> In the sim we have to instantiate and simulate a bunch of these comp=
onents,
> >>>> and attach fields with information which are required by the sim to =
do its
> >>>> job but are not part of the abstraction, such as the simulator queue=
s that
> >>>> hold lines that are in flight going to/from the banks, or the statis=
tics
> >>>> gathers. That gives us the classes concreteCache, concreteInstructio=
nCache,
> >>>> and concreteDataCache (among others).
> >>>>
> >>>> Other code in the sim sometimes needs a pointer to a cache object. A
> >>>> concrete cache object. So the pointer is a concreteCache*, and (say)=
 I want
> >>>> to point it at the I$1 of some (simulated) core. So my code should s=
ay:
> >>>>    concreteInstructionCache I1 =3D ...;
> >>>>    concreteCache* p =3D &I1;
> >>>> except you can't say that in C++, even though you could say:
> >>>>    abstractCache* q =3D &(someAbstractInstructionCache);
> >>>
> >>> I'm not sure why you need to cast to concreteCache and not abstractCa=
che.
> >>>
> >>> But you can have
> >>>
> >>> class A {};
> >>> class B: public virtual A {};
> >>> class C: public virtual B {};
> >>>
> >>> class Ap: public virtual A {};
> >>> class Bp: public Ap, public virtual B {};
> >>> class Cp: public Bp, public virtual C {};
> >>>
> >>> It seems that it gives you what you want.
> >>
> >> Yes, virtual base classes would mostly work. There's some issues becau=
se
> >> the virtual base is treated as a direct base rather than the indirect
> >> base of normal inheritance, which can lead to surprises in name
> >> collisions. Still, it's a good solution, albeit requiring much C++-foo=
..
> >> I wonder how many of our readers have ever written a virtual base in
> >> their code. I have, but then my C++ has a rep for baffling newcomers.
> >
> > I have, and I believe in 99% cases virtual base classes should be the d=
efault style. They are appropriate any time you're using inheritance for in=
terface (which is what it should be used for) not for implementation (for w=
hich inclusion as a member is usually more appropriate).
> >
> > Inheritance in Dylan and CLOS is effectively virtual inheritance.
> >
> > In the "AlcheMo" Java to native compiler (via C) I used C++ virtual inh=
eritance to implement Java interfaces.
> >
> > http://www.innaworks.com/alchemo-java-me-j2me-to-brew-android-iphone-fl=
ash-windows-mobile-cross-compiler.html
> >
>=20
> A problem with inclusion is that it makes you dependent on the public=20
> interface. There are reasons for "protected".

"protected" might as well be public. Anyone can subclass and change the acc=
ess. It's not useful for anything more than documentation.

At least you have to make yourself feel a little bit dirty to subvert "priv=
ate".
0
Bruce
12/7/2016 8:35:29 PM
On Tuesday, December 6, 2016 at 11:58:23 PM UTC-6, Ivan Godard wrote:
> On 12/6/2016 8:35 PM, MitchAlsup wrote:

> > But in all cases, there has to be a write port! otherwise the cache
> > can't store anything! I think you are looking at the problem space
> > from a slightly different PoV as a hardware centric person such as
> > myself. From my PoV, there is greater similarity than differences,
> > and the differences should be expressed outside the abstraction
> > level of the cache itself--it should be expressed at the pipeline
> > leading towards the write port.
> 
> There are major pieces of hardware that exist in only one or the other. 
> If your POV calls only the actual data byte flops the "cache" then what 
> is your word for the whole subsystem and its API interfaces to other 
> subsystems, which is what I call a "cache"?

It is called a stage in the pipeline, the cache is a component of that 
stage, but the pipeline provides the flip-flops where incoming addresses 
and data reside until reads or writes happen.

The major difference is that the data pipeline stage has a choice to 
install newly arriving data from a higher layer of the cache hierarchy, 
or data from the store pipeline. The Instruction side only has to deal
with the data arriving from a higher layer....

Thus, a "L1 cache" is composed of SRAM blocks (Data + tag) and TLB blocks
(Virtual + PTE) and both are perfectly abstracted as "a cache"; while
an L2 cache is composed of only (Data + tag). As a SW simulation, the tags 
and other things are simple 64-bit containers or multiple 64-bit containers.
In RTL one would instantiate only the bits one wants present.

I came to this PoV after building many caches for several different 
pipeline designs and getting lost in the minutia. The cache==cache has 
worked well for me.

I have not had the time to look into the writes happen before data arrives
part of your problem space, though.
0
MitchAlsup
12/8/2016 12:30:01 AM
On 12/7/2016 4:30 PM, MitchAlsup wrote:
> On Tuesday, December 6, 2016 at 11:58:23 PM UTC-6, Ivan Godard wrote:

>> There are major pieces of hardware that exist in only one or the other.
>> If your POV calls only the actual data byte flops the "cache" then what
>> is your word for the whole subsystem and its API interfaces to other
>> subsystems, which is what I call a "cache"?
>
> It is called a stage in the pipeline, the cache is a component of that
> stage, but the pipeline provides the flip-flops where incoming addresses
> and data reside until reads or writes happen.
>
> The major difference is that the data pipeline stage has a choice to
> install newly arriving data from a higher layer of the cache hierarchy,
> or data from the store pipeline. The Instruction side only has to deal
> with the data arriving from a higher layer....
>
> Thus, a "L1 cache" is composed of SRAM blocks (Data + tag) and TLB blocks
> (Virtual + PTE) and both are perfectly abstracted as "a cache"; while
> an L2 cache is composed of only (Data + tag). As a SW simulation, the tags
> and other things are simple 64-bit containers or multiple 64-bit containers.
> In RTL one would instantiate only the bits one wants present.
>
> I came to this PoV after building many caches for several different
> pipeline designs and getting lost in the minutia. The cache==cache has
> worked well for me.
>
> I have not had the time to look into the writes happen before data arrives
> part of your problem space, though.
>

Terminology again. To me a pipeline stage is a logical entity on a 
temporal dimension, that may be realized in many different actual 
circuits; so long as program execution cannot tell the realization in 
use then our sim doesn't care and doesn't model the realization. That 
means our sim can be used to evaluate software behavior of programs 
running on the sim'd hardware, and their timing, but cannot evaluate 
power usage, area, or circuit details like cross-talk.

Hardware sims that can model such things are very valuable if that's 
what you need to do, but we didn't write our own. We did write our own 
software sim because we needed something to (among other things) help us 
port the OS. Our sim is orders of magnitude faster than a RTL sim, 
because it doesn't care about RTL details. We need that speed to be able 
to sim a running OS and apps in time less than a lifetime.

Once you abstract out from the RTL, what one sims is a collection of 
components (madelled by C++ classes) with APIs connecting them. The APIs 
have annotated latency, so we can model timing. Components include the 
timing system pieces like xtals and PLLs, so the sim'd OS can change 
it's own sim's clock rates. But it's all latency, not pipelines in your 
sense.

The latencies are initially set by seat-of-the-pants guesses by the 
hardware guys about what a given component will clock out at in some 
process and piping of their choice. The resulting software times will be 
as bad as the guess quality. A more of the RTL gets created, we can use 
a (commercial) RTL sim to tighten the guess.

Using a parts-box-and-spec approach lets us wildly vary the 
configurations and see what happens. Eventually each component in the 
software parts boix will have a corresponding RTL realization, or likely 
several such, and the initial RTL for the connections between them can 
be mechanically generated, and the whole box sim'd on a RTL sim - very 
slowly - for manual tuning.

Manual tuning will certainly be able to gain by collapsing the abstract 
interfaces between the software components. However, TTM for custom 
configurations is important to us, and I expect that we may choose to 
leave some of those gains on the table so as to reduce the manual effort.
0
Ivan
12/8/2016 1:28:18 AM
On Wednesday, December 7, 2016 at 7:28:21 PM UTC-6, Ivan Godard wrote:
> On 12/7/2016 4:30 PM, MitchAlsup wrote:
> > On Tuesday, December 6, 2016 at 11:58:23 PM UTC-6, Ivan Godard wrote:
> 
> >> There are major pieces of hardware that exist in only one or the other.
> >> If your POV calls only the actual data byte flops the "cache" then what
> >> is your word for the whole subsystem and its API interfaces to other
> >> subsystems, which is what I call a "cache"?
> >
> > It is called a stage in the pipeline, the cache is a component of that
> > stage, but the pipeline provides the flip-flops where incoming addresses
> > and data reside until reads or writes happen.
> >
> > The major difference is that the data pipeline stage has a choice to
> > install newly arriving data from a higher layer of the cache hierarchy,
> > or data from the store pipeline. The Instruction side only has to deal
> > with the data arriving from a higher layer....
> >
> > Thus, a "L1 cache" is composed of SRAM blocks (Data + tag) and TLB blocks
> > (Virtual + PTE) and both are perfectly abstracted as "a cache"; while
> > an L2 cache is composed of only (Data + tag). As a SW simulation, the tags
> > and other things are simple 64-bit containers or multiple 64-bit containers.
> > In RTL one would instantiate only the bits one wants present.
> >
> > I came to this PoV after building many caches for several different
> > pipeline designs and getting lost in the minutia. The cache==cache has
> > worked well for me.
> >
> > I have not had the time to look into the writes happen before data arrives
> > part of your problem space, though.
> >
> 
> Terminology again. To me a pipeline stage is a logical entity on a 
> temporal dimension, that may be realized in many different actual 
> circuits; so long as program execution cannot tell the realization in 
> use then our sim doesn't care and doesn't model the realization. That 
> means our sim can be used to evaluate software behavior of programs 
> running on the sim'd hardware, and their timing, but cannot evaluate 
> power usage, area, or circuit details like cross-talk.

To me a pipeline stage has the slave side of the flip-flops as inputs and 
the master side the flip-flops as outputs. Between the flip-flops are logic 
components, with the restriction that they cycle at the appropriate rate.
> 
> Hardware sims that can model such things are very valuable if that's 
> what you need to do, but we didn't write our own. We did write our own 
> software sim because we needed something to (among other things) help us 
> port the OS. Our sim is orders of magnitude faster than a RTL sim, 
> because it doesn't care about RTL details. We need that speed to be able 
> to sim a running OS and apps in time less than a lifetime.

My typical pipeline simulator is DIV 10,000 whereas a typical RTL simulation
is DIV 10,000,000 or slower. If I build a "trace cache" I can generally get
the pipeline simulator down into the DIV 300 range. That is the complete
simulation of a single instruction takes 300 native (i.e., x86) cycles
(including cache effects). In 1996 my team booted SunOS on a simulator
where the only process that was not native to the SunOS "disk" was the 
idle process. Here, we replaced it with a call to "end of time slice".
This had a surprising effect:: Imagine typing tin "time<cr>" in the 
xterm window and watching the characters echo back at you, then typing 
in "time<cr>: again and watching the simulated time move faster than 
wall clock time due to skipping of all the "idle cycles".

{BTW: this simulator had all of the SBus components modeled as if the
simulator WAS a SPARC station (keyboard, mouse, graphics, disk, timer....)}

I sort of understand you want to reconfigure the components in the pipeline 
to see if one way if better than another. I have build plenty of these,
however, I generally moved components by 1/2 clock steps, and you need
a lower level abstraction to do this--both at the component level, and
at the pipeline storage level. So, at the pipeline storage level I used
latches (corresponding to clock high, clock low) then I used components
that could be put between a pair of latches, the input latch providing
hold time, the output latch capturing the calculated data.

To do this, I built data structures (structs) with # IFDEFs to hide the
latches that were not supposed to be writeable in that phase of the 
clock, and compiled the files twice, once with # define CLOCK HIGH, and
once with # define CLOCK LOW. The compiler would find which components 
were in the wrong phase because the input and output latches were not 
visible.

To find race conditions, I would assemble all of the functions that are 
called in CLOCK=HIGH (or CLOCK=LOW) and sort them into a random order 
about every 10 simulation cycles, and call them from an indirection list.
This actually works surprisingly well for finding race conditions.
0
MitchAlsup
12/8/2016 1:56:27 AM
On 12/7/2016 5:56 PM, MitchAlsup wrote:
> On Wednesday, December 7, 2016 at 7:28:21 PM UTC-6, Ivan Godard wrote:
>> On 12/7/2016 4:30 PM, MitchAlsup wrote:
>>> On Tuesday, December 6, 2016 at 11:58:23 PM UTC-6, Ivan Godard wrote:
>>
>>>> There are major pieces of hardware that exist in only one or the other.
>>>> If your POV calls only the actual data byte flops the "cache" then what
>>>> is your word for the whole subsystem and its API interfaces to other
>>>> subsystems, which is what I call a "cache"?
>>>
>>> It is called a stage in the pipeline, the cache is a component of that
>>> stage, but the pipeline provides the flip-flops where incoming addresses
>>> and data reside until reads or writes happen.
>>>
>>> The major difference is that the data pipeline stage has a choice to
>>> install newly arriving data from a higher layer of the cache hierarchy,
>>> or data from the store pipeline. The Instruction side only has to deal
>>> with the data arriving from a higher layer....
>>>
>>> Thus, a "L1 cache" is composed of SRAM blocks (Data + tag) and TLB blocks
>>> (Virtual + PTE) and both are perfectly abstracted as "a cache"; while
>>> an L2 cache is composed of only (Data + tag). As a SW simulation, the tags
>>> and other things are simple 64-bit containers or multiple 64-bit containers.
>>> In RTL one would instantiate only the bits one wants present.
>>>
>>> I came to this PoV after building many caches for several different
>>> pipeline designs and getting lost in the minutia. The cache==cache has
>>> worked well for me.
>>>
>>> I have not had the time to look into the writes happen before data arrives
>>> part of your problem space, though.
>>>
>>
>> Terminology again. To me a pipeline stage is a logical entity on a
>> temporal dimension, that may be realized in many different actual
>> circuits; so long as program execution cannot tell the realization in
>> use then our sim doesn't care and doesn't model the realization. That
>> means our sim can be used to evaluate software behavior of programs
>> running on the sim'd hardware, and their timing, but cannot evaluate
>> power usage, area, or circuit details like cross-talk.
>
> To me a pipeline stage has the slave side of the flip-flops as inputs and
> the master side the flip-flops as outputs. Between the flip-flops are logic
> components, with the restriction that they cycle at the appropriate rate.
>>
>> Hardware sims that can model such things are very valuable if that's
>> what you need to do, but we didn't write our own. We did write our own
>> software sim because we needed something to (among other things) help us
>> port the OS. Our sim is orders of magnitude faster than a RTL sim,
>> because it doesn't care about RTL details. We need that speed to be able
>> to sim a running OS and apps in time less than a lifetime.
>
> My typical pipeline simulator is DIV 10,000 whereas a typical RTL simulation
> is DIV 10,000,000 or slower. If I build a "trace cache" I can generally get
> the pipeline simulator down into the DIV 300 range. That is the complete
> simulation of a single instruction takes 300 native (i.e., x86) cycles
> (including cache effects). In 1996 my team booted SunOS on a simulator
> where the only process that was not native to the SunOS "disk" was the
> idle process. Here, we replaced it with a call to "end of time slice".
> This had a surprising effect:: Imagine typing tin "time<cr>" in the
> xterm window and watching the characters echo back at you, then typing
> in "time<cr>: again and watching the simulated time move faster than
> wall clock time due to skipping of all the "idle cycles".

Cute :-)

Things speed up when you only have one operation per instruction :-)

And only one operand per operation :-)

And only RISC-like operations :-)

Anybody spare a few smileys?

> {BTW: this simulator had all of the SBus components modeled as if the
> simulator WAS a SPARC station (keyboard, mouse, graphics, disk, timer....)}

We are bare-metal. The simulated power up sequence (a handful of MMIO 
stores) ends by writing to runReg and you are on. The software running 
on the sim'd machine has to probe the environment and figure out what is 
out there on its own.

> I sort of understand you want to reconfigure the components in the pipeline
> to see if one way if better than another. I have build plenty of these,
> however, I generally moved components by 1/2 clock steps, and you need
> a lower level abstraction to do this--both at the component level, and
> at the pipeline storage level. So, at the pipeline storage level I used
> latches (corresponding to clock high, clock low) then I used components
> that could be put between a pair of latches, the input latch providing
> hold time, the output latch capturing the calculated data.

We model an abstract "tick", which I think of as being a cycle but could 
as easily be a half-cycle so long as you changed the spec of the xtal 
component. Again, that's a hardware realization issue that our sim 
doesn't care about.

> To do this, I built data structures (structs) with # IFDEFs to hide the
> latches that were not supposed to be writeable in that phase of the
> clock, and compiled the files twice, once with # define CLOCK HIGH, and
> once with # define CLOCK LOW. The compiler would find which components
> were in the wrong phase because the input and output latches were not
> visible.

We are not at the latch level. Something goes to a component, and some 
(specified or derived) number of ticks later something comes out and 
goes to some other component. What happens inside is black. The mapping 
from (simulated) wall-clock to tick is a function of the start xtal, the 
PLL settings, and which tick distributor the component is on. Of course 
, the "inside" of the component may be a graph of other components, so 
the result timing may dynamically depend on the state of the "inside" 
components. The topology is given by the spec.

> To find race conditions, I would assemble all of the functions that are
> called in CLOCK=HIGH (or CLOCK=LOW) and sort them into a random order
> about every 10 simulation cycles, and call them from an indirection list.
> This actually works surprisingly well for finding race conditions.

A sound approach. No races in ours; everything in a tick happens all at 
once logically. To implement that in the sim where we can only call one 
API at a time we break each dependency loop and accumulate pending state 
changes at the break. After all the calls have been made we then apply 
the accumulated changes. It's similar to RCU in a way.

The breaks can involve time tunnels, so for example simulated memory is 
running in the future of the simulated L/S units in the core. As a 
result, a store in the core can directly update the memory without 
queues; yes, there's a latency, but that latency is the time tunnel. Of 
course, you can't do that bi-directionally: you can visit the future but 
not the past. Consequently loads can get a value from the future memory 
as part of the L/S execution, but that loaded value can't be given to 
the belt until round-trip latency has elapsed, and so there must be 
buffering and queuing. One nice thing about the time tunnels is that the 
UI can inspect memory now but see it as it will be when all the stores 
currently in process have worked their way down the hierarchy because 
thorse stores have already been done in the future.

Of course, a magic tick in which everything happens at once can give us 
non-physical behavior, but we rely on the hardware implementers to complain.

0
Ivan
12/8/2016 3:04:50 AM
In article <2016Dec6.180055@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>already5chosen@yahoo.com writes:
>[C++]
>>Interfaces are absent, yes, and it sucks.
>
>Are they not called "abstract classes"?  Yes, abstract classes in C++
>are not as restricted as Java interfaces, but they can do everything
>that Java interfaces can do.

Compare and constrast:

  https://en.wikipedia.org/wiki/Concepts_(C%2B%2B)

It might be a better fit, depending upon of course by what one
considers an interface.
0
mrs
12/8/2016 11:16:13 PM
On Wednesday, December 7, 2016 at 9:04:49 PM UTC-6, Ivan Godard wrote:
> Cute :-)
> 
> Things speed up when you only have one operation per instruction :-)

Do you count SIN(x) as one "operation" per instruction?
 
> And only one operand per operation :-)

Can't do a FMAC with less than 3, can't do fancy address modes with less 
than 2+immediate.
 
> And only RISC-like operations :-)

I personally, see no particular problem in hoisting most of the "never 
changing" math.c into microcode expressed as instructions in ISA.
 
> Anybody spare a few smileys?

No, really, I do. It is better to get them right in HW than to allow all 
sorts of attacks on SW.
0
MitchAlsup
12/9/2016 12:26:49 AM
On 12/8/2016 4:26 PM, MitchAlsup wrote:
> On Wednesday, December 7, 2016 at 9:04:49 PM UTC-6, Ivan Godard wrote:
>> Cute :-)
>>
>> Things speed up when you only have one operation per instruction :-)
>
> Do you count SIN(x) as one "operation" per instruction?

If it is native, yes. Perhaps "one dispatch per instruction", although 
that gets weird too in a pre-cracking microcoded design with trace buffers.


>> And only one operand per operation :-)
>
> Can't do a FMAC with less than 3, can't do fancy address modes with less
> than 2+immediate.

Of course, but mot what I meant; I was referring to SIMD.

>> And only RISC-like operations :-)
>
> I personally, see no particular problem in hoisting most of the "never
> changing" math.c into microcode expressed as instructions in ISA.

I'm allergic to microcode in general. If the "microcode" is the same as 
themacrocode (which putting math.c in the "hardware" sounds like) then 
I'm happy; that's just code-in-a-ROM, like the original Macs. But if 
it's different then it keeps the compiler and ASM programmers from 
seeing, and dealing with, the real machine.

>> Anybody spare a few smileys?
>
> No, really, I do. It is better to get them right in HW than to allow all
> sorts of attacks on SW.
>

0
Ivan
12/9/2016 12:44:22 AM
On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:
> On 12/8/2016 4:26 PM, MitchAlsup wrote:

> > I personally, see no particular problem in hoisting most of the "never
> > changing" math.c into microcode expressed as instructions in ISA.
>=20
> I'm allergic to microcode in general. If the "microcode" is the same as=
=20
> themacrocode (which putting math.c in the "hardware" sounds like) then=20
> I'm happy; that's just code-in-a-ROM, like the original Macs. But if=20
> it's different then it keeps the compiler and ASM programmers from=20
> seeing, and dealing with, the real machine.

Consider something like:

.........float ATAN2( float x, float y )
.........{
.................if( x  > 0.0 ........... ) return ATAN( x / y );
.................if( x  < 0.0 && y  > 0.0 ) return ATAN( y / x ) + =CF=80;
.................if( x  < 0.0 && y  < 0.0 ) return ATAN( y / x ) - =CF=80;
.................if( x =3D=3D 0.0 && y  > 0.0 ) return +=CF=80/2;
.................if( x =3D=3D 0.0 && y  < 0.0 ) return -=CF=80/2;
.................if( x =3D=3D+0.0 && y =3D=3D+0.0 ) return +0.0;
.................if( x =3D=3D+0.0 && y =3D=3D-0.0 ) return +=CF=80;
.................if( x =3D=3D-0.0 && y =3D=3D+0.0 ) return -0.0;
.................if( x =3D=3D-0.0 && y =3D=3D-0.0 ) return -=CF=80;
.........}

Microcode can perform ALL of the comparisons and all of the branching in=20
one clock!
0
MitchAlsup
12/9/2016 6:41:32 PM
On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:

> I'm allergic to microcode in general. If the "microcode" is the same as 
> themacrocode (which putting math.c in the "hardware" sounds like) then 
> I'm happy; that's just code-in-a-ROM, like the original Macs. But if 
> it's different then it keeps the compiler and ASM programmers from 
> seeing, and dealing with, the real machine.

Perhaps my previous answer was not sufficiently well worded:: let me try
again::

I am not in particular suggesting that the "pipeline" be controlled by 
microcode. Or that teh decode process be controlled by microcode. What 
I am suggesting is that "a function unit" be controlled by microcode.

A special function unit receives instructions from whatever decoder, and
out-of-order system the machine implements. This function unit is allowed
to observe all of the operands, and perform various calculations on them.
It is ALSO allowed to formulate calculations for other function units and
to inject these synthesized calculations into whatever out-of-order 
scheduler the system implements.

For example, assume a CORDIC unit that borrows a few pieces of arith-
metic from the naturally occurring function units so the CORDIC unit is
smaller but still capable of performing rather high level calculations
such as SIN, OS, TAN, LN2,... 

The SIN, COS, TAN routines need argument reduction which is easily accomplished by using the FMAC unit and feeding it a few pieces of PI()
based on the exponent of the argument to the instruction. 2 FMAC units 
of time later, argument reduction is complete, and polynomial selection
and polynomial evaluation begins.

Of course, I just made up the CORDIC unit as a diversion up above. The
microcoded unit can just as easily just feed the FMAC and DIV units 
Chebyshev coefficients in Horner order through FMAC and presto::

Double precision std transcendental functions are now complete in
on-the-order of 28 cycles (FMAC=4 cycles, DIV=17 cycles, SQRT=20 
cycles) down from the 70-200 cycles currently available. The large 
tables of coefficients on longer pollute the data cache (or instruction
stream) and the transcendental calculations are now cheap enough so
the programmer no longer has to try to minimize their use. You get the
added benefit that the transcendental instructions deliver properly 
rounded results! (0.5 ULP rather than 1.0-1.5 ULP).

The DIV unit mentioned up above is just such a microcoded unit, setting
up division estimates, and performing a number of Newton-Raphson iter-
ations to finish up IEEE quality result. SQRT is a handmaiden to DIV,
and when you think about it, so are all of the standard transcendental
functions.
0
MitchAlsup
12/10/2016 12:19:36 AM
On 12/9/2016 10:41 AM, MitchAlsup wrote:
> On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:
>> On 12/8/2016 4:26 PM, MitchAlsup wrote:
>
>>> I personally, see no particular problem in hoisting most of the "never
>>> changing" math.c into microcode expressed as instructions in ISA.
>>
>> I'm allergic to microcode in general. If the "microcode" is the same as
>> themacrocode (which putting math.c in the "hardware" sounds like) then
>> I'm happy; that's just code-in-a-ROM, like the original Macs. But if
>> it's different then it keeps the compiler and ASM programmers from
>> seeing, and dealing with, the real machine.
>
> Consider something like:
>
> ........float ATAN2( float x, float y )
> ........{
> ................if( x  > 0.0 ........... ) return ATAN( x / y );
> ................if( x  < 0.0 && y  > 0.0 ) return ATAN( y / x ) + π;
> ................if( x  < 0.0 && y  < 0.0 ) return ATAN( y / x ) - π;
> ................if( x == 0.0 && y  > 0.0 ) return +π/2;
> ................if( x == 0.0 && y  < 0.0 ) return -π/2;
> ................if( x ==+0.0 && y ==+0.0 ) return +0.0;
> ................if( x ==+0.0 && y ==-0.0 ) return +π;
> ................if( x ==-0.0 && y ==+0.0 ) return -0.0;
> ................if( x ==-0.0 && y ==-0.0 ) return -π;
> ........}
>
> Microcode can perform ALL of the comparisons and all of the branching in
> one clock!
>

First, this code is not IEEE; x can be unordered w/r/t 0.0 and you are 
not testing for that case.

Second, microcode can do this only if it has been designed for this case 
or similar: in particular, a 9-way branch requires 9 addresses (or 
offsets) and a place to put them. So "microcode" can't do your example, 
"your particular microcode" can.

Lastly, a suitably designed macrocode can do the same when configured 
with the comparison and branch capacity exhibited by the microcode. Ours 
comes close, requiring one extra cycle for the conjunctions. If we had 
conjunction branches (which have been considered but are on hold pending 
measurement in large codesets) then it's all one cycle on a Mill too.

To cheer you up: we also cannot handle the "unordered" cases without an 
auxiliary op that does what the equivalent micro-op would.
0
Ivan
12/10/2016 4:40:45 AM
On 12/9/2016 4:19 PM, MitchAlsup wrote:
> On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:
>
>> I'm allergic to microcode in general. If the "microcode" is the same as
>> themacrocode (which putting math.c in the "hardware" sounds like) then
>> I'm happy; that's just code-in-a-ROM, like the original Macs. But if
>> it's different then it keeps the compiler and ASM programmers from
>> seeing, and dealing with, the real machine.
>
> Perhaps my previous answer was not sufficiently well worded:: let me try
> again::
>
> I am not in particular suggesting that the "pipeline" be controlled by
> microcode. Or that teh decode process be controlled by microcode. What
> I am suggesting is that "a function unit" be controlled by microcode.
>
> A special function unit receives instructions from whatever decoder, and
> out-of-order system the machine implements. This function unit is allowed
> to observe all of the operands, and perform various calculations on them.
> It is ALSO allowed to formulate calculations for other function units and
> to inject these synthesized calculations into whatever out-of-order
> scheduler the system implements.
>
> For example, assume a CORDIC unit that borrows a few pieces of arith-
> metic from the naturally occurring function units so the CORDIC unit is
> smaller but still capable of performing rather high level calculations
> such as SIN, OS, TAN, LN2,...
>
> The SIN, COS, TAN routines need argument reduction which is easily accomplished by using the FMAC unit and feeding it a few pieces of PI()
> based on the exponent of the argument to the instruction. 2 FMAC units
> of time later, argument reduction is complete, and polynomial selection
> and polynomial evaluation begins.
>
> Of course, I just made up the CORDIC unit as a diversion up above. The
> microcoded unit can just as easily just feed the FMAC and DIV units
> Chebyshev coefficients in Horner order through FMAC and presto::
>
> Double precision std transcendental functions are now complete in
> on-the-order of 28 cycles (FMAC=4 cycles, DIV=17 cycles, SQRT=20
> cycles) down from the 70-200 cycles currently available. The large
> tables of coefficients on longer pollute the data cache (or instruction
> stream) and the transcendental calculations are now cheap enough so
> the programmer no longer has to try to minimize their use. You get the
> added benefit that the transcendental instructions deliver properly
> rounded results! (0.5 ULP rather than 1.0-1.5 ULP).
>
> The DIV unit mentioned up above is just such a microcoded unit, setting
> up division estimates, and performing a number of Newton-Raphson iter-
> ations to finish up IEEE quality result. SQRT is a handmaiden to DIV,
> and when you think about it, so are all of the standard transcendental
> functions.
>

Yes, a reasonable implementation of a math lib function will use other 
FUs as part of its job. Of course it would; there's no point in building 
a sin() box (say) that has the whole series built in.

But that's not an argument for microcode. After all, a macrocode 
implementation of sin() will *also* use the FMA unit etc, just as the 
microcode does. The two implementations, macro and micro, will wind up 
using exactly the same parts of the machine, because those parts are 
what are needed to compute a sin() or whatever.

The *only* difference between microcode and macrocode is that there is 
an extra step in the translation between the file and the signals to the 
FUs. That extra step has both good and bad points. As a good point, a 
microcoded ISA can have richer operations yet yet have more compact 
macrocode that one without the step. Thus a microcoded sin() operation 
likely occupies 32 or fewer bits in the macrocode of the load module, 
while the same expressed in macrocode is likely to need several times as 
much space. Of course, if the macrocode sin() were a function body then 
the difference would amortize over all the uses and approach the space 
of a call op, which is likely to be of similar order than an explicit 
sin() micro-op.

A drawback to microcode is that the translation costs at least one stage 
in the pipeline, which has costs in mispredicts, power and area. A micro 
trace cache relieve this but does not eliminate it.

To me the biggest problem with microcode, and the source of my aversion, 
is that it is not WYSIWYG. Many threads on this board have had people 
running tests against a zillion versions of X86 and speculating about 
what the microcode is actually doing so they can tune their macrocode to 
the particular box. I consider such practice to be worse than a waste of 
time. Call me old fashioned, but I still feel that a programmer that 
wants to address the machine should be able to address the machine. YMMV.
0
Ivan
12/10/2016 5:00:44 AM
On Saturday, December 10, 2016 at 7:00:42 AM UTC+2, Ivan Godard wrote:
> On 12/9/2016 4:19 PM, MitchAlsup wrote:
> > On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:
> >
> >> I'm allergic to microcode in general. If the "microcode" is the same as
> >> themacrocode (which putting math.c in the "hardware" sounds like) then
> >> I'm happy; that's just code-in-a-ROM, like the original Macs. But if
> >> it's different then it keeps the compiler and ASM programmers from
> >> seeing, and dealing with, the real machine.
> >
> > Perhaps my previous answer was not sufficiently well worded:: let me try
> > again::
> >
> > I am not in particular suggesting that the "pipeline" be controlled by
> > microcode. Or that teh decode process be controlled by microcode. What
> > I am suggesting is that "a function unit" be controlled by microcode.
> >
> > A special function unit receives instructions from whatever decoder, and
> > out-of-order system the machine implements. This function unit is allowed
> > to observe all of the operands, and perform various calculations on them.
> > It is ALSO allowed to formulate calculations for other function units and
> > to inject these synthesized calculations into whatever out-of-order
> > scheduler the system implements.
> >
> > For example, assume a CORDIC unit that borrows a few pieces of arith-
> > metic from the naturally occurring function units so the CORDIC unit is
> > smaller but still capable of performing rather high level calculations
> > such as SIN, OS, TAN, LN2,...
> >
> > The SIN, COS, TAN routines need argument reduction which is easily accomplished by using the FMAC unit and feeding it a few pieces of PI()
> > based on the exponent of the argument to the instruction. 2 FMAC units
> > of time later, argument reduction is complete, and polynomial selection
> > and polynomial evaluation begins.
> >
> > Of course, I just made up the CORDIC unit as a diversion up above. The
> > microcoded unit can just as easily just feed the FMAC and DIV units
> > Chebyshev coefficients in Horner order through FMAC and presto::
> >
> > Double precision std transcendental functions are now complete in
> > on-the-order of 28 cycles (FMAC=4 cycles, DIV=17 cycles, SQRT=20
> > cycles) down from the 70-200 cycles currently available. The large
> > tables of coefficients on longer pollute the data cache (or instruction
> > stream) and the transcendental calculations are now cheap enough so
> > the programmer no longer has to try to minimize their use. You get the
> > added benefit that the transcendental instructions deliver properly
> > rounded results! (0.5 ULP rather than 1.0-1.5 ULP).
> >
> > The DIV unit mentioned up above is just such a microcoded unit, setting
> > up division estimates, and performing a number of Newton-Raphson iter-
> > ations to finish up IEEE quality result. SQRT is a handmaiden to DIV,
> > and when you think about it, so are all of the standard transcendental
> > functions.
> >
> 
> Yes, a reasonable implementation of a math lib function will use other 
> FUs as part of its job. Of course it would; there's no point in building 
> a sin() box (say) that has the whole series built in.
> 
> But that's not an argument for microcode. After all, a macrocode 
> implementation of sin() will *also* use the FMA unit etc, just as the 
> microcode does. The two implementations, macro and micro, will wind up 
> using exactly the same parts of the machine, because those parts are 
> what are needed to compute a sin() or whatever.
> 
> The *only* difference between microcode and macrocode is that there is 
> an extra step in the translation between the file and the signals to the 
> FUs. That extra step has both good and bad points. As a good point, a 
> microcoded ISA can have richer operations yet yet have more compact 
> macrocode that one without the step. Thus a microcoded sin() operation 
> likely occupies 32 or fewer bits in the macrocode of the load module, 
> while the same expressed in macrocode is likely to need several times as 
> much space. Of course, if the macrocode sin() were a function body then 
> the difference would amortize over all the uses and approach the space 
> of a call op, which is likely to be of similar order than an explicit 
> sin() micro-op.
>


Another advantage if a forward compatibility.
Suppose, today my machine have no FMA. Tomorrow I added FMA. All old code that used SIN opcode is automatically running faster. Without recompile, relink, update of shared libraries and other such nonsense.
And yes, old MAC ROM achieves the same effect, but overhead of calling ROM is higher, esp. for rarely called functions, so, performance-oriented programs are less likely to use in-ROM library  than "real" microcode.
Of course, in specific cases of sin() or of atan2() really high-performance code is unlikely to use "real" microcode either, but for FDIV they do and right now microcoded FDIV is MUCH better than all alternatives.


> A drawback to microcode is that the translation costs at least one stage 
> in the pipeline, which has costs in mispredicts, power and area. A micro 
> trace cache relieve this but does not eliminate it.
> 
> To me the biggest problem with microcode, and the source of my aversion, 
> is that it is not WYSIWYG. Many threads on this board have had people 
> running tests against a zillion versions of X86 and speculating about 
> what the microcode is actually doing so they can tune their macrocode to 
> the particular box. I consider such practice to be worse than a waste of 
> time. Call me old fashioned, but I still feel that a programmer that 
> wants to address the machine should be able to address the machine. YMMV.

0
already5chosen
12/10/2016 4:35:32 PM
On 12/10/2016 8:35 AM, already5chosen@yahoo.com wrote:
> On Saturday, December 10, 2016 at 7:00:42 AM UTC+2, Ivan Godard
> wrote:
>> On 12/9/2016 4:19 PM, MitchAlsup wrote:
>>> On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard
>>> wrote:
>>>
>>>> I'm allergic to microcode in general. If the "microcode" is the
>>>> same as themacrocode (which putting math.c in the "hardware"
>>>> sounds like) then I'm happy; that's just code-in-a-ROM, like
>>>> the original Macs. But if it's different then it keeps the
>>>> compiler and ASM programmers from seeing, and dealing with, the
>>>> real machine.
>>>
>>> Perhaps my previous answer was not sufficiently well worded:: let
>>> me try again::
>>>
>>> I am not in particular suggesting that the "pipeline" be
>>> controlled by microcode. Or that teh decode process be controlled
>>> by microcode. What I am suggesting is that "a function unit" be
>>> controlled by microcode.
>>>
>>> A special function unit receives instructions from whatever
>>> decoder, and out-of-order system the machine implements. This
>>> function unit is allowed to observe all of the operands, and
>>> perform various calculations on them. It is ALSO allowed to
>>> formulate calculations for other function units and to inject
>>> these synthesized calculations into whatever out-of-order
>>> scheduler the system implements.
>>>
>>> For example, assume a CORDIC unit that borrows a few pieces of
>>> arith- metic from the naturally occurring function units so the
>>> CORDIC unit is smaller but still capable of performing rather
>>> high level calculations such as SIN, OS, TAN, LN2,...
>>>
>>> The SIN, COS, TAN routines need argument reduction which is
>>> easily accomplished by using the FMAC unit and feeding it a few
>>> pieces of PI() based on the exponent of the argument to the
>>> instruction. 2 FMAC units of time later, argument reduction is
>>> complete, and polynomial selection and polynomial evaluation
>>> begins.
>>>
>>> Of course, I just made up the CORDIC unit as a diversion up
>>> above. The microcoded unit can just as easily just feed the FMAC
>>> and DIV units Chebyshev coefficients in Horner order through FMAC
>>> and presto::
>>>
>>> Double precision std transcendental functions are now complete
>>> in on-the-order of 28 cycles (FMAC=4 cycles, DIV=17 cycles,
>>> SQRT=20 cycles) down from the 70-200 cycles currently available.
>>> The large tables of coefficients on longer pollute the data cache
>>> (or instruction stream) and the transcendental calculations are
>>> now cheap enough so the programmer no longer has to try to
>>> minimize their use. You get the added benefit that the
>>> transcendental instructions deliver properly rounded results!
>>> (0.5 ULP rather than 1.0-1.5 ULP).
>>>
>>> The DIV unit mentioned up above is just such a microcoded unit,
>>> setting up division estimates, and performing a number of
>>> Newton-Raphson iter- ations to finish up IEEE quality result.
>>> SQRT is a handmaiden to DIV, and when you think about it, so are
>>> all of the standard transcendental functions.
>>>
>>
>> Yes, a reasonable implementation of a math lib function will use
>> other FUs as part of its job. Of course it would; there's no point
>> in building a sin() box (say) that has the whole series built in.
>>
>> But that's not an argument for microcode. After all, a macrocode
>> implementation of sin() will *also* use the FMA unit etc, just as
>> the microcode does. The two implementations, macro and micro, will
>> wind up using exactly the same parts of the machine, because those
>> parts are what are needed to compute a sin() or whatever.
>>
>> The *only* difference between microcode and macrocode is that there
>> is an extra step in the translation between the file and the
>> signals to the FUs. That extra step has both good and bad points.
>> As a good point, a microcoded ISA can have richer operations yet
>> yet have more compact macrocode that one without the step. Thus a
>> microcoded sin() operation likely occupies 32 or fewer bits in the
>> macrocode of the load module, while the same expressed in macrocode
>> is likely to need several times as much space. Of course, if the
>> macrocode sin() were a function body then the difference would
>> amortize over all the uses and approach the space of a call op,
>> which is likely to be of similar order than an explicit sin()
>> micro-op.
>>
>
>
> Another advantage if a forward compatibility. Suppose, today my
> machine have no FMA. Tomorrow I added FMA. All old code that used SIN
> opcode is automatically running faster. Without recompile, relink,
> update of shared libraries and other such nonsense. And yes, old MAC
> ROM achieves the same effect, but overhead of calling ROM is higher,
> esp. for rarely called functions, so, performance-oriented programs
> are less likely to use in-ROM library  than "real" microcode. Of
> course, in specific cases of sin() or of atan2() really
> high-performance code is unlikely to use "real" microcode either, but
> for FDIV they do and right now microcoded FDIV is MUCH better than
> all alternatives.
>
>


Gustibus non disputandum.

The argument of backwards compatibility applies to every function or 
code fragment. Where do you stop with microcode? If sin() is microcoded, 
why not malloc()? Printf? Deduct_FICA_from_gross_pay()? Our fix is to 
regenerate the code automatically and transparently whenever it runs on 
a new platform.

As for arguments from relative performance of micro- vs. macro-code, a 
significant difference between the two reflects poor macro-code design, 
not inherently superiority of micro-code.
0
Ivan
12/10/2016 5:36:36 PM
On Saturday, December 10, 2016 at 11:36:40 AM UTC-6, Ivan Godard wrote:

So, Ivan, how many cycles does SIN take on the highest end MILL?
0
MitchAlsup
12/10/2016 6:01:57 PM
On Sat, 10 Dec 2016 08:35:32 -0800 (PST), already5chosen@yahoo.com
wrote:

>On Saturday, December 10, 2016 at 7:00:42 AM UTC+2, Ivan Godard wrote:
>> On 12/9/2016 4:19 PM, MitchAlsup wrote:
>> > On Thursday, December 8, 2016 at 6:44:21 PM UTC-6, Ivan Godard wrote:
>> >
>> >> I'm allergic to microcode in general. If the "microcode" is the same as
>> >> themacrocode (which putting math.c in the "hardware" sounds like) then
>> >> I'm happy; that's just code-in-a-ROM, like the original Macs. But if
>> >> it's different then it keeps the compiler and ASM programmers from
>> >> seeing, and dealing with, the real machine.
>> >
>> > Perhaps my previous answer was not sufficiently well worded:: let me try
>> > again::
>> >
>> > I am not in particular suggesting that the "pipeline" be controlled by
>> > microcode. Or that teh decode process be controlled by microcode. What
>> > I am suggesting is that "a function unit" be controlled by microcode.
>> >
>> > A special function unit receives instructions from whatever decoder, and
>> > out-of-order system the machine implements. This function unit is allowed
>> > to observe all of the operands, and perform various calculations on them.
>> > It is ALSO allowed to formulate calculations for other function units and
>> > to inject these synthesized calculations into whatever out-of-order
>> > scheduler the system implements.
>> >
>> > For example, assume a CORDIC unit that borrows a few pieces of arith-
>> > metic from the naturally occurring function units so the CORDIC unit is
>> > smaller but still capable of performing rather high level calculations
>> > such as SIN, OS, TAN, LN2,...
>> >
>> > The SIN, COS, TAN routines need argument reduction which is easily accomplished by using the FMAC unit and feeding it a few pieces of PI()
>> > based on the exponent of the argument to the instruction. 2 FMAC units
>> > of time later, argument reduction is complete, and polynomial selection
>> > and polynomial evaluation begins.
>> >
>> > Of course, I just made up the CORDIC unit as a diversion up above. The
>> > microcoded unit can just as easily just feed the FMAC and DIV units
>> > Chebyshev coefficients in Horner order through FMAC and presto::
>> >
>> > Double precision std transcendental functions are now complete in
>> > on-the-order of 28 cycles (FMAC=4 cycles, DIV=17 cycles, SQRT=20
>> > cycles) down from the 70-200 cycles currently available. The large
>> > tables of coefficients on longer pollute the data cache (or instruction
>> > stream) and the transcendental calculations are now cheap enough so
>> > the programmer no longer has to try to minimize their use. You get the
>> > added benefit that the transcendental instructions deliver properly
>> > rounded results! (0.5 ULP rather than 1.0-1.5 ULP).
>> >
>> > The DIV unit mentioned up above is just such a microcoded unit, setting
>> > up division estimates, and performing a number of Newton-Raphson iter-
>> > ations to finish up IEEE quality result. SQRT is a handmaiden to DIV,
>> > and when you think about it, so are all of the standard transcendental
>> > functions.
>> >
>> 
>> Yes, a reasonable implementation of a math lib function will use other 
>> FUs as part of its job. Of course it would; there's no point in building 
>> a sin() box (say) that has the whole series built in.
>> 
>> But that's not an argument for microcode. After all, a macrocode 
>> implementation of sin() will *also* use the FMA unit etc, just as the 
>> microcode does. The two implementations, macro and micro, will wind up 
>> using exactly the same parts of the machine, because those parts are 
>> what are needed to compute a sin() or whatever.
>> 
>> The *only* difference between microcode and macrocode is that there is 
>> an extra step in the translation between the file and the signals to the 
>> FUs. That extra step has both good and bad points. As a good point, a 
>> microcoded ISA can have richer operations yet yet have more compact 
>> macrocode that one without the step. Thus a microcoded sin() operation 
>> likely occupies 32 or fewer bits in the macrocode of the load module, 
>> while the same expressed in macrocode is likely to need several times as 
>> much space. Of course, if the macrocode sin() were a function body then 
>> the difference would amortize over all the uses and approach the space 
>> of a call op, which is likely to be of similar order than an explicit 
>> sin() micro-op.
>>
>
>
>Another advantage if a forward compatibility.
>Suppose, today my machine have no FMA. Tomorrow I added FMA. All old code that used SIN opcode is automatically running faster. Without recompile, relink, update of shared libraries and other such nonsense.
>And yes, old MAC ROM achieves the same effect, but overhead of calling ROM is higher, esp. for rarely called functions, so, performance-oriented programs are less likely to use in-ROM library  than "real" microcode.
>Of course, in specific cases of sin() or of atan2() really high-performance code is unlikely to use "real" microcode either, but for FDIV they do and right now microcoded FDIV is MUCH better than all alternatives.


Well, you certainly don't want to be executing anything out of actual
ROM on anything but the smallest embedded processors, just too slow.
Like everyone else, you copy the code to RAM, and then it caches just
like anything (never use SIN()? it'll be out of cache and slow, but
then who cares?).

But there's a lot to recommend a (Alpha) PALcode or (IBM)
millicode-type approach.  In either case you're just doing "normal"
instruction execution, but in that hyper-privileged state you have
access to non-architected hardware features, which may well help the
task at hand.
0
Robert
12/11/2016 8:29:39 AM
MitchAlsup wrote:
> On Saturday, December 10, 2016 at 11:36:40 AM UTC-6, Ivan Godard wrote:
>
> So, Ivan, how many cycles does SIN take on the highest end MILL?
>
The same as any other wide & fast machine, i.e. the latency of the 
argument reduction, poly eval and final fixup (including any exceptional 
inputs/outputs).

Doing many in parallel to improve throughput is obviously doable, but if 
that is needed then it is far more likely that the basic algorithm 
should be improved:

I.e. when doing FSIN/FCOS in an inner loop is is very likely that the 
inputs to the trancendental functions vary linearly, right?

In that case I would always start by looking at the sin/cos summation 
formulas since you have much faster ways to determine sin(a+x) when you 
already know sin(a) and x is a (small) constant inside the loop.

If I worried about ultimate precision then I would do a new full 
evaluation every N iterations.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
12/13/2016 7:50:44 AM
In article <o2o98k$ng4$1@gioia.aioe.org>,
Terje Mathisen  <terje.mathisen@tmsw.no> wrote:
>MitchAlsup wrote:
>> On Saturday, December 10, 2016 at 11:36:40 AM UTC-6, Ivan Godard wrote:
>>
>> So, Ivan, how many cycles does SIN take on the highest end MILL?
>>
>The same as any other wide & fast machine, i.e. the latency of the 
>argument reduction, poly eval and final fixup (including any exceptional 
>inputs/outputs).
>
>Doing many in parallel to improve throughput is obviously doable, but if 
>that is needed then it is far more likely that the basic algorithm 
>should be improved:
>
>I.e. when doing FSIN/FCOS in an inner loop is is very likely that the 
>inputs to the trancendental functions vary linearly, right?
>
>In that case I would always start by looking at the sin/cos summation 
>formulas since you have much faster ways to determine sin(a+x) when you 
>already know sin(a) and x is a (small) constant inside the loop.
>
>If I worried about ultimate precision then I would do a new full 
>evaluation every N iterations.

That is far better done in the nearly machine-independent parts of
a compiler, not least as the choice of what to do will depend on the
user's requirements.  It's also fairly common for there to be quite
nasty cancellation problems, which can often be reduced by similar
reorganisations.


Regards,
Nick Maclaren.
0
nmm
12/13/2016 10:07:05 AM
> A special function unit receives instructions from whatever decoder, and
> out-of-order system the machine implements. This function unit is allowed
> to observe all of the operands, and perform various calculations on them.
> It is ALSO allowed to formulate calculations for other function units and
> to inject these synthesized calculations into whatever out-of-order
> scheduler the system implements.

So you turn the ATAN2 instruction into a kind of "on-the-fly
macroexpanded instruction" (hence your use of the name "microcode") but
where the macro-expansion is not constant (contrary to typical
microcode), instead the expansion is chosen dynamically by the execution
of the ATAN2 instruction (which cheaply determines which of N cases the
current operands correspond to)?


        Stefan
0
Stefan
12/13/2016 3:21:31 PM
On Tuesday, December 13, 2016 at 9:21:21 AM UTC-6, Stefan Monnier wrote:
> > A special function unit receives instructions from whatever decoder, and
> > out-of-order system the machine implements. This function unit is allowed
> > to observe all of the operands, and perform various calculations on them.
> > It is ALSO allowed to formulate calculations for other function units and
> > to inject these synthesized calculations into whatever out-of-order
> > scheduler the system implements.
> 
> So you turn the ATAN2 instruction into a kind of "on-the-fly

All transcendentals including ATAN2.

> macroexpanded instruction" (hence your use of the name "microcode") but
> where the macro-expansion is not constant (contrary to typical
> microcode), instead the expansion is chosen dynamically by the execution
> of the ATAN2 instruction (which cheaply determines which of N cases the
> current operands correspond to)?

Some parts of various transcendentals use bit patterns within the 
operands to choose various constants on which to perform calculations.

For ATAN2 this mainly consists of special value determinations.
For SIN/COS this consists in choosing the right set of bits of PI
.....based on the exponent of the operand in order to do argument
.....reduction.
For EXP2 one separates the integer part of the operand from the 
.....fractional part of the operand, the integer part is added
.....to the exponent, while the fractional part is run through
.....one of many polynomials based on high order fractional bits.
For SQRT the low order bit of the exponent is considered along
.....with the high order bits of the fraction to determine the 
.....one of many polynomials.

These fields <above> are easily extracted in HW in <essentially>
zero time in HW (microcoded, or not) and run through a switch
statement so that polynomial calculations can begin immediately.

{Instead of the switch, one could use a LDROM instruction that
uses the cache path to deliver coefficient data to the FMAC unit
without checking TLB, cache miss, or polluting the data cache. 
With LDROM one then has a single polynomial, but has to execute 2
instructions (micro or not) every 4 cycles to maintain throughput.}

Mitch
0
MitchAlsup
12/13/2016 6:02:06 PM
MitchAlsup wrote:
> On Tuesday, December 13, 2016 at 9:21:21 AM UTC-6, Stefan Monnier wrote:
>>> A special function unit receives instructions from whatever decoder, and
>>> out-of-order system the machine implements. This function unit is allowed
>>> to observe all of the operands, and perform various calculations on them.
>>> It is ALSO allowed to formulate calculations for other function units and
>>> to inject these synthesized calculations into whatever out-of-order
>>> scheduler the system implements.
>>
>> So you turn the ATAN2 instruction into a kind of "on-the-fly
>
> All transcendentals including ATAN2.
>
>> macroexpanded instruction" (hence your use of the name "microcode") but
>> where the macro-expansion is not constant (contrary to typical
>> microcode), instead the expansion is chosen dynamically by the execution
>> of the ATAN2 instruction (which cheaply determines which of N cases the
>> current operands correspond to)?
>
> Some parts of various transcendentals use bit patterns within the
> operands to choose various constants on which to perform calculations.
>
> For ATAN2 this mainly consists of special value determinations.
> For SIN/COS this consists in choosing the right set of bits of PI
> ....based on the exponent of the operand in order to do argument
> ....reduction.
> For EXP2 one separates the integer part of the operand from the
> ....fractional part of the operand, the integer part is added
> ....to the exponent, while the fractional part is run through
> ....one of many polynomials based on high order fractional bits.

I agree with you on all these...

> For SQRT the low order bit of the exponent is considered along
> ....with the high order bits of the fraction to determine the
> ....one of many polynomials.

but this one is interesting:

My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations 
(depending upon the quality of the invsqrt() lookup and the target 
precision) in integer code which provably delivers more than 2N+1 
mantissa bits, so the rounding will be correct.

(The fact that my code is targeted for a cpu without fp hw does make 
quite a difference.)
>
> These fields <above> are easily extracted in HW in <essentially>
> zero time in HW (microcoded, or not) and run through a switch
> statement so that polynomial calculations can begin immediately.

That's the nice part of working directly with the hw. :-)

Terje
>
> {Instead of the switch, one could use a LDROM instruction that
> uses the cache path to deliver coefficient data to the FMAC unit
> without checking TLB, cache miss, or polluting the data cache.
> With LDROM one then has a single polynomial, but has to execute 2
> instructions (micro or not) every 4 cycles to maintain throughput.}
>
> Mitch
>


-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
12/14/2016 6:58:40 AM
On Wednesday, December 14, 2016 at 12:58:42 AM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Tuesday, December 13, 2016 at 9:21:21 AM UTC-6, Stefan Monnier wrote:
> >>> A special function unit receives instructions from whatever decoder, and
> >>> out-of-order system the machine implements. This function unit is allowed
> >>> to observe all of the operands, and perform various calculations on them.
> >>> It is ALSO allowed to formulate calculations for other function units and
> >>> to inject these synthesized calculations into whatever out-of-order
> >>> scheduler the system implements.
> >>
> >> So you turn the ATAN2 instruction into a kind of "on-the-fly
> >
> > All transcendentals including ATAN2.
> >
> >> macroexpanded instruction" (hence your use of the name "microcode") but
> >> where the macro-expansion is not constant (contrary to typical
> >> microcode), instead the expansion is chosen dynamically by the execution
> >> of the ATAN2 instruction (which cheaply determines which of N cases the
> >> current operands correspond to)?
> >
> > Some parts of various transcendentals use bit patterns within the
> > operands to choose various constants on which to perform calculations.
> >
> > For ATAN2 this mainly consists of special value determinations.
> > For SIN/COS this consists in choosing the right set of bits of PI
> > ....based on the exponent of the operand in order to do argument
> > ....reduction.
> > For EXP2 one separates the integer part of the operand from the
> > ....fractional part of the operand, the integer part is added
> > ....to the exponent, while the fractional part is run through
> > ....one of many polynomials based on high order fractional bits.
> 
> I agree with you on all these...
> 
> > For SQRT the low order bit of the exponent is considered along
> > ....with the high order bits of the fraction to determine the
> > ....one of many polynomials.
> 
> but this one is interesting:
> 
> My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations 
> (depending upon the quality of the invsqrt() lookup and the target 
> precision) in integer code which provably delivers more than 2N+1 
> mantissa bits, so the rounding will be correct.

The low order bit of the exponent allows the input range to be considered
1..4 rather than 1..2. This gets rid of a multiply by SQRT(2) as it is
already factored into the coefficient tables. The rest of the exponent
is shifted down by 1 bit.

> 
> (The fact that my code is targeted for a cpu without fp hw does make 
> quite a difference.)
> >
> > These fields <above> are easily extracted in HW in <essentially>
> > zero time in HW (microcoded, or not) and run through a switch
> > statement so that polynomial calculations can begin immediately.
> 
> That's the nice part of working directly with the hw. :-)
> 
> Terje
> >
> > {Instead of the switch, one could use a LDROM instruction that
> > uses the cache path to deliver coefficient data to the FMAC unit
> > without checking TLB, cache miss, or polluting the data cache.
> > With LDROM one then has a single polynomial, but has to execute 2
> > instructions (micro or not) every 4 cycles to maintain throughput.}
> >
> > Mitch
> >
> 
> 
> -- 
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

0
MitchAlsup
12/14/2016 4:29:19 PM
MitchAlsup wrote:
> On Wednesday, December 14, 2016 at 12:58:42 AM UTC-6, Terje Mathisen wrote:
>> I agree with you on all these...
>>
>>> For SQRT the low order bit of the exponent is considered along
>>> ....with the high order bits of the fraction to determine the
>>> ....one of many polynomials.
>>
>> but this one is interesting:
>>
>> My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations
>> (depending upon the quality of the invsqrt() lookup and the target
>> precision) in integer code which provably delivers more than 2N+1
>> mantissa bits, so the rounding will be correct.
>
> The low order bit of the exponent allows the input range to be considered
> 1..4 rather than 1..2. This gets rid of a multiply by SQRT(2) as it is
> already factored into the coefficient tables. The rest of the exponent
> is shifted down by 1 bit.

OK, that makes perfect sense for a poly evaluation. Making sure that you 
get correctly rounded results is "interesting". :-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
12/14/2016 4:59:36 PM
On Wednesday, December 14, 2016 at 10:59:38 AM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, December 14, 2016 at 12:58:42 AM UTC-6, Terje Mathisen wrote:
> >> I agree with you on all these...
> >>
> >>> For SQRT the low order bit of the exponent is considered along
> >>> ....with the high order bits of the fraction to determine the
> >>> ....one of many polynomials.
> >>
> >> but this one is interesting:
> >>
> >> My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations
> >> (depending upon the quality of the invsqrt() lookup and the target
> >> precision) in integer code which provably delivers more than 2N+1
> >> mantissa bits, so the rounding will be correct.
> >
> > The low order bit of the exponent allows the input range to be considered
> > 1..4 rather than 1..2. This gets rid of a multiply by SQRT(2) as it is
> > already factored into the coefficient tables. The rest of the exponent
> > is shifted down by 1 bit.
> 
> OK, that makes perfect sense for a poly evaluation. Making sure that you 
> get correctly rounded results is "interesting". :-)

All you need is 2*N+3 bits !
0
MitchAlsup
12/14/2016 5:53:42 PM
In article <6aebfba9-36ad-4216-8ee5-2e553e531d96@googlegroups.com>,
MitchAlsup  <MitchAlsup@aol.com> wrote:
>On Wednesday, December 14, 2016 at 10:59:38 AM UTC-6, Terje Mathisen wrote:
>> >>
>> >>> For SQRT the low order bit of the exponent is considered along
>> >>> ....with the high order bits of the fraction to determine the
>> >>> ....one of many polynomials.
>> 
>> OK, that makes perfect sense for a poly evaluation. Making sure that you 
>> get correctly rounded results is "interesting". :-)
>
>All you need is 2*N+3 bits !

You need a fair amount of mathematics, and some quite serious
thinking, too :-)


Regards,
Nick Maclaren.
0
nmm
12/14/2016 6:08:53 PM
MitchAlsup wrote:
> On Wednesday, December 14, 2016 at 10:59:38 AM UTC-6, Terje Mathisen wrote:
>> MitchAlsup wrote:
>>> On Wednesday, December 14, 2016 at 12:58:42 AM UTC-6, Terje Mathisen wrote:
>>>> I agree with you on all these...
>>>>
>>>>> For SQRT the low order bit of the exponent is considered along
>>>>> ....with the high order bits of the fraction to determine the
>>>>> ....one of many polynomials.
>>>>
>>>> but this one is interesting:
>>>>
>>>> My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations
>>>> (depending upon the quality of the invsqrt() lookup and the target
>>>> precision) in integer code which provably delivers more than 2N+1
>>>> mantissa bits, so the rounding will be correct.
>>>
>>> The low order bit of the exponent allows the input range to be considered
>>> 1..4 rather than 1..2. This gets rid of a multiply by SQRT(2) as it is
>>> already factored into the coefficient tables. The rest of the exponent
>>> is shifted down by 1 bit.
>>
>> OK, that makes perfect sense for a poly evaluation. Making sure that you
>> get correctly rounded results is "interesting". :-)
>
> All you need is 2*N+3 bits !
>
OK, that means that you can use the double paths to calculate float 
sqrt() but for a double sqrt you would need a 109+ bit mantissa which is 
hard to get even if you have FMAC without internal rounding...

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
12/14/2016 7:28:22 PM
On Wednesday, December 14, 2016 at 1:28:24 PM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, December 14, 2016 at 10:59:38 AM UTC-6, Terje Mathisen wrote:
> >> MitchAlsup wrote:
> >>> On Wednesday, December 14, 2016 at 12:58:42 AM UTC-6, Terje Mathisen wrote:
> >>>> I agree with you on all these...
> >>>>
> >>>>> For SQRT the low order bit of the exponent is considered along
> >>>>> ....with the high order bits of the fraction to determine the
> >>>>> ....one of many polynomials.
> >>>>
> >>>> but this one is interesting:
> >>>>
> >>>> My current code uses plain reciprocal sqrt lookup and 2-4 NR iterations
> >>>> (depending upon the quality of the invsqrt() lookup and the target
> >>>> precision) in integer code which provably delivers more than 2N+1
> >>>> mantissa bits, so the rounding will be correct.
> >>>
> >>> The low order bit of the exponent allows the input range to be considered
> >>> 1..4 rather than 1..2. This gets rid of a multiply by SQRT(2) as it is
> >>> already factored into the coefficient tables. The rest of the exponent
> >>> is shifted down by 1 bit.
> >>
> >> OK, that makes perfect sense for a poly evaluation. Making sure that you
> >> get correctly rounded results is "interesting". :-)
> >
> > All you need is 2*N+3 bits !
> >
> OK, that means that you can use the double paths to calculate float 
> sqrt() but for a double sqrt you would need a 109+ bit mantissa which is 
> hard to get even if you have FMAC without internal rounding...

A double FMAC has 52+53+53+52 wide input to the Adder after the multiplier 
tree. Strictly speaking the 52 on the front is an incrementer, and the 52 
on the rear is highly degenerate, with 106 bits of 3-input add in the 
middle. So, an double FMAC has the required precision--and you can get 
the desired precision doing Newton-Raphson iterations (2*FMACs).

The above adds up to 208 but I seem to recall the right number is 206.

But rest assured that a double FMAC has the right amount of stuff.
0
MitchAlsup
12/15/2016 9:21:39 PM
On Tuesday, December 6, 2016 at 12:42:00 AM UTC+5:30, Ivan Godard wrote:
> while I'm bitching and moaning about bad languages: no ladder 
> inheritance nor any way to build it. C-coders can stop reading; this 
> complaint is not for you.
> 
> Notation: "->" means "is derived from".
> 
> Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. 
> Except when you are all done you want the effect of A'->B'->C'.
> 
> This is trivial in SmallTalk and Mary, and doable in languages with 
> delegation rather than inheritance. Users of C-family languages don't 
> get it.
> 
> Example from our codebase in the Mill project: class abstractCache 
> models the behavior of a cache-like component, with fields for a line 
> size, a replacement behavior, and so on. It is specialized to 
> abstractInstructionCache and abstractDataCache (among others) in the 
> obvious way.
> 
> In the sim we have to instantiate and simulate a bunch of these 
> components, and attach fields with information which are required by the 
> sim to do its job but are not part of the abstraction, such as the 
> simulator queues that hold lines that are in flight going to/from the 
> banks, or the statistics gathers. That gives us the classes 
> concreteCache, concreteInstructionCache, and concreteDataCache (among 
> others).
> 
> Other code in the sim sometimes needs a pointer to a cache object. A 
> concrete cache object. So the pointer is a concreteCache*, and (say) I 
> want to point it at the I$1 of some (simulated) core. So my code should say:
>     concreteInstructionCache I1 = ...;
>     concreteCache* p = &I1;
> except you can't say that in C++, even though you could say:
>     abstractCache* q = &(someAbstractInstructionCache);
> 
> My complaint is really about inheritance in general: inheritance is a 
> mistake. Oh well :-(

C++ has two types of inheritance:

i) non-virtual

ii) virtual

These are distinct from the C++ equivalent of Java interfaces
and the horse's mouth tells me that OOP is good, so good
uses of both (pure virtual classes/interfaces and virtual/non-virtual
base classes) exist in books(leave alone application sw).

From a software perspective, I guess what you probably can look
at is a good generator to generate an array out of C++ classes
of all sorts. An exact wording would be that a good generator
should look towards emitting an array everytime a C++ object
is instantiated and treat the generated array just in time
[i.e. at the cusp between sw & (generated) hw] so that 
phase ordering issues are averted.

That way, HW issues cool off.

Sincerely,
Seima Rao.




0
Seima
12/17/2016 1:56:27 PM
On Saturday, December 17, 2016 at 7:26:29 PM UTC+5:30, Seima Rao wrote:
> On Tuesday, December 6, 2016 at 12:42:00 AM UTC+5:30, Ivan Godard wrote:
> > while I'm bitching and moaning about bad languages: no ladder 
> > inheritance nor any way to build it. C-coders can stop reading; this 
> > complaint is not for you.
> > 
> > Notation: "->" means "is derived from".
> > 
> > Assume A->B->C. Now define subclasses of each: A'->A, B'->B, C'->C. 
> > Except when you are all done you want the effect of A'->B'->C'.
> > 
> > This is trivial in SmallTalk and Mary, and doable in languages with 
> > delegation rather than inheritance. Users of C-family languages don't 
> > get it.
> > 
> > Example from our codebase in the Mill project: class abstractCache 
> > models the behavior of a cache-like component, with fields for a line 
> > size, a replacement behavior, and so on. It is specialized to 
> > abstractInstructionCache and abstractDataCache (among others) in the 
> > obvious way.
> > 
> > In the sim we have to instantiate and simulate a bunch of these 
> > components, and attach fields with information which are required by the 
> > sim to do its job but are not part of the abstraction, such as the 
> > simulator queues that hold lines that are in flight going to/from the 
> > banks, or the statistics gathers. That gives us the classes 
> > concreteCache, concreteInstructionCache, and concreteDataCache (among 
> > others).
> > 
> > Other code in the sim sometimes needs a pointer to a cache object. A 
> > concrete cache object. So the pointer is a concreteCache*, and (say) I 
> > want to point it at the I$1 of some (simulated) core. So my code should say:
> >     concreteInstructionCache I1 = ...;
> >     concreteCache* p = &I1;
> > except you can't say that in C++, even though you could say:
> >     abstractCache* q = &(someAbstractInstructionCache);
> > 
> > My complaint is really about inheritance in general: inheritance is a 
> > mistake. Oh well :-(
> 
> C++ has two types of inheritance:
> 
> i) non-virtual
> 
> ii) virtual
> 
> These are distinct from the C++ equivalent of Java interfaces
> and the horse's mouth tells me that OOP is good, so good
> uses of both (pure virtual classes/interfaces and virtual/non-virtual
> base classes) exist in books(leave alone application sw).
> 
> From a software perspective, I guess what you probably can look
> at is a good generator to generate an array out of C++ classes
> of all sorts. An exact wording would be that a good generator
> should look towards emitting an array everytime a C++ object
> is instantiated and treat the generated array just in time
> [i.e. at the cusp between sw & (generated) hw] so that 
> phase ordering issues are averted.
> 
> That way, HW issues cool off.
> 
> Sincerely,
> Seima Rao.

Polymorphism in C++ also exists as template classes.
Did you try template based polymorphism ?

Sincerely,
Seima Rao.
0
Seima
12/17/2016 1:59:07 PM
Reply: