[l/m xx/xx/xx] Dead Comp. Arch. Society c.par/c.s.super (26/28) FAQ

Archive-Name: superpar-faq
Last-modified: 25 Apr 2003

26	Dead computer architecture society		< * This Panel * >
27	Special call
28	Dedications
2	Introduction and Table of Contents and justification
4	Comp.parallel news group history
6	parlib
8	comp.parallel group dynamics
10	Related news groups, archives and references
18	Supercomputing and Crayisms
20	IBM and Amdahl
22	Grand challenges and HPCC
24	Suggested (required) readings
25	Security

This space intentionally left blank (temporarily).


This is a roughly chronological list of past supercomputer, parallel computer,
or especially "interesting" architectures, not paper designs (See panel 14,
for references for those).  Computer archeology is important
(not merely interesting), because it is the failed projects where real
learning takes place.  Even Seymour Cray designed "failed" machines.

DCAS came from a so-so Robin Williams movie: Dead Poets Society (DPS)
which nerd CS students went to see (trust me, he's better in live performance).

In turn, the dead-architecture, lessons-learned discussion started in
comp.arch later that same year.  The idea was to collect material from
knowledgeable ex-engineers and former scientists, anonymously if need be,
before it was lost (since the company had either died or evolved).
The problem is that academic and commercial literature is fraught with
all kinds of useless glowing marketing/sales language.  We (the net,
I didn't do this alone) collected comments anonymously (if need be)
to prevent lessons being lost.  The idea was that anyone could comment.
It was that netters had hashed over this material before so many times,
it seemed useful to capture it (like an FAQ ;^).  We assembled a
list of architectures.

Maybe, a third the way through the list, I was asked by certain people
with CRI to suspend discussion, because CRI was starting to acquire
Supertek (which I personally always thought was a mistake).
We never resumed.  We lost the inertia.

Ever hear of the Gibbs Project?
If not: you should not be surprised.

Around that same time, ASPLOS came to Santa Clara, where they held a
Dead Computer Architecture Society panel session.  I had a meeting of
some sort (possibly SIGGRAPH) and I missed the starting hour.
I gave Peter Capek of IBM TJW a video camera, but I did not keep the tape
because I merely wanted to see what I had missed
(if I had, I would have given it to J. Fisher who sat on the panel).
I did not regard that as recording history.
The panel session discussed the various failed minisupercomputer firms
(perhaps I should use more flowery marketing language like
"attempted?").  Either way, lessons were there in front of 200+ architects,
OS and language designers.  Perhaps there was another video camera
in the room.....
	Let see what were the four architectures represented?

One poster has mentioned "Why no mention of the Symbolics 3600, LMI, or TI
LISP machines?"  I am not adverse to including the lessons from those machines,
however, the DCAS discussion was about minisupercomputers.  The 3600 and other
LISP machines fell more into the class of workstations during their time
competing with the Xerox "D-machines" [Dorado, Dolphin, and Dandelion],
SUN, SGI, VAXStation, etc.  Most at the time were not even parallel machines.
But if you can pitch me a good case, I'll consider them.  Do it.

Also useful:
old header files for those systems which ran C compilers.

Most recently, I am reminded of a warm fall Saturday morning in a house
on a hill overlooking the beautiful Santa Barbara Channel.
George Michael, who I drove just to see Glen Culler (who had suffered
a stroke some time back), was talking about "war stories,"
Ms. Culler [wife and David's mother] chimed in:
	"I really think you need a better title for your book
	{one GAM was working on}.  No one will buy it with a word like
	'war stories' in the title...."
Three of us in the room chuckled.  She is great.

The Dead Computer Architecture Society

Floating Point Systems (FPS)
(Purchased by Cray Research)
        FPS AP-series (Culler based design with VLIW attributes)
                7600 performance attached to your PDP-11.
Roots with Culler-Harris (CHI), Inc.  FPS started with specialized
attached processors FPS AP-120B, and scaled from there
to the FPS T-series Hypercubes.  The AP-120 line could be attached
to machines as small as a PDP-11.  They were controlled by specialized
Fortran (and later C) system calls (a software emulator existed for code
development: obviously slow).  Known as an FFT and MXM box.
It was marketed in 1977 in Scientific American as 7600 power on
your minicomputer and showed quite respectable, but economical,
number crunching power (I/O was still a problem).
38-bit words.  Pipelined, precursor to VLIW?  Perhaps.

Later models: FPS-164, FPS-264, FPS-500, APP
Larger 64-bit attached processors.  Pre-IEEE-FP.  Attached processors
became useful and popular for signal processing, medical apps.

        FPS T-series (hypercubes)
Someone else (maybe Stevenson) can write a T-series paragraph.

Absorbed by Cray Research.  
This business unit sold by SGI to Sun at time of SGI/Cray merger, 7/96.
[Current living incarnation.] The former CS-6400 line:
Current living incarnation is the UltraEnterprise 10000 and UltraHPC 10000
(2 different names for 2 different markets, same box).
Often known by its CRI code name "Starfire"


The Denelcor Heterogeneous Element Processor (HEP) was perhaps the most
unusual architecture a student will never get a chance to see.
My first knowledge of this machine came from Mike Muuss (BRL, scheduled
to get one [4 PEMs delivered]) at a time when the DEC VAX-11/780 was the only
VAX around.  Later I would invite representatives to Ames.

        7600-class scalar CPUs at a time when the Cray-1 was out
	and the X-MP was just being delivered.  64-bit machine.
        Full-Empty bits on the memory, goes way beyond mere Test-and-Set
	Separate physical Instruction (128MB) and Data (1 GB) Memory
	based in Aurora, CO, East of Denver.
	Operating systems: HEP-OS and HEP Unix.
	Programming and architecture manuals at the Museum.
	Keywords: dataflow (limited),
	13 systems made.  6 delivered.  1 2-PEM system stayed
	at the company.  Photos.
	Sites (from Burton)
  Los Alamos
  Messerschmitt-Bolkow-Blohm, Munich (later 2+ PEMS)
  BRL (4 PEMs)
  Classified (4 PEMs)
  Shoko, Ltd., Tokyo

Problems: somewhat underpowered at the time, programming difficulties.
Hardware deadlock.  Early inexperience with serious parallel systems.
Software.  Ambitious. Pipelining.  Dataflow.

I would hope that a HEP simulator sees the light of day, one of these days.
It is suggested that the Horizon simulator is a close approximation to
the HEP.  I do not know how to obtain it (but know roughly where),
I just don't have time.

Successor machines: HEP-2 (design 70% complete?) and HEP-3.
Horizon (paper design).  Tera (1 machine currently 2 CPUs going to 4 shortly).
Keywords: learning to live with latency.

See Dennis Shasha's book Out of Their Minds for a barely adequate profile
on Burton Smith (too short).


	Sunnyvale, CA based super-minicomputer.
	ECL-technology, bus-oriented, true 64-bit, first IEEE 754 FP machine,
	SISD (non-vector) cpus (1-10 later 1-16 CPUs).
        Impressive for its time (designed to compete against the VAX-11/780
                AND low end CDC supercomputers).
        EMBOS ("Unix-like" operating system, "We renamed `grep` to `find.`"
                "Ah? what did you renamed `find` to?")
	Tata Elxsi, sites in Australia and India.
	Over 200 sites, and many CPUs.  1 CPU per board,
	Photos exist.
	The firm dissolved in the late 1980s, people to H-P.

Personal experience: Saw and briefly used a 4 processor system which
replaced a Cyber 172.  Since replaced by networked workstations.
That application was real-time flight data anaylsis on experimental aircraft.


ECL is expensive.  Don't screw around with OSes.  Understand the market.

What become of Elxsi and CEO Appleton-Jones?
The minisuper purveyor got out of the computer business and bought a bunch of
Bickford's Family Restaurants.


Once called Data Flow Systems.
        FX/8, FX/80, FX/2800, etc.

The FX/8 (the first architecture)
had a particularly slow scalar system using MC68008 CPUs for
Interactive front-end Processors.  This at a time when SUN workstations
had better known 68010 processors.

The multiple backend Computational Engines (CEs) were
a proprietary design with vector instruction.

The Berkeley Unix port was a mixed beast.

Basis for U. Ill. Cedar Project.  Fizzled?

Acquired Raster Technologies (graphics).

The Friday before Memorial Day 1992.  At least that's when 80% of us got
laid off.
1) Undercapitalized in a market not as big as it first appeared.
        > I disagree about the "undercapitalized".
2) Technology changing faster than we could keep up with.  (Small
   Unix-CPU systems can be designed and shipped far faster than a
   parallel system.)
        > True,
3) Relying on Intel for a part that didn't _end_ in "86".
        > True,
4) Long lead time on sales of MPP systems.

#include "alliant.h"

Surviving news group, comp.sys.alliant.
Museum has 1 FX/8 "Do not run classified data on this machine"
and 1 FX/1 (former Wallach desk stand).


A couple of us pondered what ever happened to SN#1 of this machine.
I saw it!  Even typed 'ls' on it!

Is the third flag at the assembly room at Convex Computer: Maryland?

Museum History Center (Trace 14).

Among others: Waited too long to go to ECL.


What was the US DOD doing funding a Canadian company?
That was the first question which ever came onto the net.

Homebase was Edmonton, Al, Ca
Formed by some academic types from U of Edmonton ca 1984?
                        (or was it U of Alberta at Edmonton?)
Original design "on a napkin at a bar"...
Weakness:  Hardware.
SPS-1  68000, "proof of concept"
       a hierarchy of busses, 4 68k's per bus
       16 of those busses on another bus in a box
       (called a "cage")
       hook as many cages together as you want/can afford
       ca 1986?, none installed
SPS-2  as above but with
       68020 + 68881/2 + MMU,  "production system"
       largest system actually built was about 1088 cpus (~ 1024 + 64),
       a "benchmark system", proof of concept (again)
       ca 1988?, <~10 installed
SPS-3  as above but with
       68040, one or two actually built
       ca 1990?, ~0 installed
Strength:  Software.

Basic idea:  take VM Unix, remove pager/swapper
             replace pager with custom pager which swaps pages
             between processors according to rules which make
             the illusion of single address space, SPMD
Hey, this is Pre SCI, KSR, et al.
Neatest thing:  debugger could make "ghost pages", which contained
                a count of number of reads/write per word-
                it could find uninitialized words easier than anyone.
                (notice I didn't say "faster")
                (i.e., one ghost word per data word)
Funniest thing:  we had more trouble getting our Canadian friends
                 into NASA than into NSA...
Q: What did "SPS" stand for?
A: Oh, Scalable Processor System, or Super Parallel System, or
   something.  If you can make up something sensible from "SPS",
   someone at Myrias probably used the term at least once...
Q: Why did US DoD fund a Canadian startup?
A: No one else had provably _scalable_ system at the time.
They went bankrupt without warning, got a call at 1430 to come
to the office, "you're out of work at 1700"
Don't like it?  Sue us in Edmonton  "you're toast, eh?"
Today:  they're still alive and well and selling the SW for WS clusters
You can see them at Supercomputing '9?, perhaps sharing a small booth

Lessons Learnt:
Myrias:  Hardware matters.  Best software in the world needs
something to run on.  300 Kflops/processor hasn't been supercomputing
for quite a while now.


Near Portland, OR.
Balance 8000 and Symmetry series
NS 32K CPUs on a bus.

Absorbed by IBM.

Flexible Computer (FLEX)

Saw it!  (Langley Research Center)  Even typed 'ls' on it!
NS32032 processors on a bus.
Competitor to Sequent.

Scientific Computer Systems (SCS)

        SCS-40, SCS-30 (in order of development time, not performance)
An attempt at a low-cost binary compatible Cray X-MP clone mostly for
software development, but also marketed to those who could not afford a
full sized Cray.  This was probably a bad business decision on their part.
                "Come down from your Cray...." kiss of death.
The term "Crayette" was first used with this machine.
                The last hosts running COS and CTSS (CIVIC: the Fortran
		which replace LRLTRAN).
Licensed COS 1.13 from CRI.  Meanwhile CRI was transitioning to UNICOS (tm).
They secretly hoped UNICOS was going to fail.
Then hoped for a remaining (surviving) COS/CTSS market: also failed.
        shipped a few dozen Cray-clones.

The first developed machines were delivered to San Diego.
Roots also in Portland, OR and Boeing (also purchased a couple) in Seattle.

Supertek (Purchased by Cray Research)

Another attempt at a low-cost binary compatible Cray X-MP clone.
Mike Fung of H-P tried this.  Santa Clara, CA.
        S-1 (not to be confused with the LLNL S-1 project), S-2
                The last host running CTSS
You should probably note that the S-1 was sold by Cray Research after the
buyout as the "XMS" and the S-2 (which was still under development at the time
of purchase) was sold as the "Y-MP/EL"

Architect: Albert ???

Mike Fung was the architect, and he worked at places like Ames (not on
supers) and H-P if I recall.  Their offices were across the street from
the Tech-Mart and the Santa Clara Convention Center [where Steven
Kartashev had his super altercation] and kitty corner to
Great America.  And for a while those Supertek signs were Cray Research
signs (which impressed a few Valley people).

Ref: papers exist (COMPCON).

The Supertek S-1 was the third of three (SCS [Scientific Computer Systems,
San Diego] was the first, XXX was second) "Crayettes" "minisupercomputers"
which (unlike the Convex0 WAS Cray-instruction set compatible.  The SCS-40 came
out first (delivered to SDSC and Boeing).  Then XXX, then Supertek.
I keep forgetting who XXX was.

These were all basically CMOS X-MPs which ran anywhere from 10% to 25%
of an X-MP for proportional price.  As I recall, I think the packaging
was a standard 21 inch rack cabinet.  Just a box.  No C shape.
When Woodrow and I visited we were briefed on the S-2 (basically a scaled
Y-MP clone).  They had just completed a Unix port.

There were basically two perceived reasons to buy an SCS-40 (or SCS-30)
or an S-1 or XXX:
        1) Running a Cray was "cost prohibitive," so you could do software
        development on the SCS or S-1 then move the code over to the
        "real" cray running pure batch.  This was the era of
        anti-interactive computing on supers.  This sounds good for hardware.
        It ignored some software realities.

        2) The first SCS-40 OS was COS, with CTSS to follow.
        CTSS was the chosen first OS for the S-1.  There were several
        arguments for this:
                This sent a signal from companies like Boeing to CRI:
                "No confidence" on CRI OS futures (CX-OS/UNICOS).
                To have a machine backward compatible to run old COS
                programs even if slower in case CRI failed.
                COS was frozen at v1.14 or v1.15.  The CFT compiler
                was frozen at v1.14 (or 1.13).  CRI would be no place
                to run this environment (this was all CAL [Cray Assembly
                Language]).  I think I remember something like Boeing
                intending to buy a dozen or so machines.
        This was supposed to save software development time.
        It didn't.  I can no longer remember who did the S-1 port or
        what it was called (could have been Lachman).
        I no longer recall the UNICOS transition plan.
I never saw an S-2.  Our next visit was to the Bostom area.

So you could run CTSS/CIVIC X-MP programs on the S-1 (which would have
made DOE fundees somewhat happy except for speed) or if running their
Unix, their f77 compiler.  You could not run both OSes.  There is some
appeal to binary compatibility.  We resisted that Siren.

Determining the pedigree of the sales personnel could tell you whether
you were going to have problems.

I do not recall the delivery figures for SCS or Supertek.
I do not recall the figures of how many ELs were sold by CRI and resellers.
But a lot of the long time purchasers of supercomputers wondered
why CRI acquired Supertek (but they were "government").  It was seen as
a step backward, down, terrible.  Perhaps it was harsh business reality
catching up.

This is stuff in the comp.sys.super FAQ.  No sense in cluttering up
c.u.c. as it's pretty quiet.

I have not specifically expended much energy to pick up an SCS-40 or S-1,
or BBN Butterfly/Monarch [actually not a bad machine in some respects],
or a slew of other machines for the Computer Museum, but if the Museum
is offered, we might take it.  We just picked up some of the JvNC ETA-10 today,
and a MasPar MP-1 and dockmaster.mil.

The Supertek S-1 should not be confused with the never completed LLNL S-1

Mike Fung is still around.  I see a friend of his every so often.

We ultimately ended up buying four SGI 4D/380s with 8 processors each as
support processors, the successor of which posts the c.s.s. FAQ.

The timings
on most instructions were a bit different (particularly Scatter/Gather, which
was horribly slow on the S-1 and generated false operand range errors under
some conditions). Overall, the S-1 was about 1/7 as fast as an X-MP cpu.

Supertek became the Cray "Entry Level Systems Division".
The Supertek S-1 was sold by CRAY (after the takeover) as the "Cray XMS".

The Supertek S-2 (which was in the design phase when the takeover
happened), became the "Cray YMP-EL".

It was your basic 19 inch rack mount McComputer, with fancy skins.

Well, to add a bit to this:

1) The advantage of Cray having a ST-1 (XMS) was that the company
management could plop one down on the floor in Chippewa Falls &
say, "See, it ***can*** be done!"  We need to expand our line with more
cost effective systems.  Air Cooled instead of Chilled Water & all that!

2) The "government" customers thought it was a giant step backwards.
Yup!  Even as the "cold war" ended & they slowed down their acquisition
of "high end" Cray systems.  Face it, the high end was drying up.  NASA
is a perfect example.  The agency entered a new era of more cost effective
computing.  The purchase of thoroughbread race-horses was curtailed in
favor of much less exotic & much less expensive horseflesh.

3) Cray's lower end needed protection from the Convexes of the world.
If we didn't put hardware into *that* marketplace, then we would have
failed even earlier.  So, the lower-end of Cray's market needed to be
filled (we all hoped with a Cray system).

4) Just remember the basic rule of business.  Your vendor needs to
make a reasonable profit off of you.  If he/she doesn't, someday you
will wake up & he/she will be gone.  Cray is no longer an independant
vendor of High End systems.  It is now a part of another company.

Why?  Well, sales & profit figures show the answer.  Harsh business
realities.  I suspect, after all of the shouting's over, the EL line of
Cray systems assisted in keeping Cray independant about 2 extra years.

5) Then again, Convex & DEC lost their independence.
IBM suffered great hardship.  Thinking Machines snuffed it completely.
CDC never recovered from the uglyness of the 1980s, let alone the
1990s.  It hasn't been a pretty decade for the high end & mid range
at all.

I think you've forgotten that a Cray FLOP used to be more cost
effective than a small-machine FLOP. Most people were always into
cost-effective computing; if the cost-effective machine happened
to also be a supercomputer, that was icing on the cake. VLSI killed
that icing.

Culler-Harris Inc. (CHI)

Culler Scientific

A conglomerate with something like 40 divisions?
One producing a Hypercube clone similar to Intel and NCUBE in Acadia, CA.
Used at Caltech and other sites.

Never saw one.


Based in Santa Clara, CA.  A somewhat mysterious company
        Originally optical interconnect.  It changed to a systolic design.
        The only VMS based "supercomputer."
        Two? delivered (JPL and TRW) in Beta test.
Its last PR gasp was when an employee sold a manual to the Soviets in
the mid-80s.  That employee was sent to prison for violating
export control laws.


Milpitas, CA
Hosted SIGBIG meetings.
        Cydra 5  (black boxes)
        Two delivered?
Pittsburgh Supercomputer Center (PSC) and 
One Cydrome was delivered to Yale, where a water pipe running through
the machine room burst over it.

Bill Gropp
(At Yale 1982-1990)


Hash addresses:
The Cydra 5 (aka MXCL 5) did this. It was one of the things that made
the memory system expensive (it didn't take 0 cycles; but it did make
access to memory pretty uniform independent of stride).
I should try to find Richard about this and also see if he retains any
old manuals.

Museum History Center (Cydra 5).


Cray Computer Corporation

Colorado Springs, CO

A computer company doing research when the parent research company was doing
computers.  Forked from its parent CRI sometime after an unsuccessful
Cray Labs in Boulder, CO.  About the same time as SSI(1).
        Supercomputer Systems Inc. AG  (Zurich?) SSI (3)
        As a company, is current very much alive.  DO NOT CONFUSE IT
        with the other two firms.

The 3 was intended to be a 16 processor machines with a 2.0 ns clock cycle
(1 instruction per cycle unlike the Cray-2).

The 4 was to be a 1 cubic foot cube.
The 4 abandoned the local memory.  And brought back B and T registers.

GaAs technology (Vitesse)

Founder killed due to injury complications suffered following a car crash.

Successor, SRC Computer.


Supercomputer Systems Inc. (1)

Eau Claire, WI
Steve Chen
Heavily funded funded by IBM with not a lot to show for it.
1 2-CPU prototype.  Photo in BusinessWeek inside a Farady cage.
Stories that the hunk of the machine was not properly cooled on first power up
and that the hulk was later found by the side of a road abandoned.

Scheduled to be a ramped up 64-processor Y-MP with another memory stage.

Cray Research Incorporated
Acquired by Silicon Graphics.

Still Birth

American Supercomputer
A Crayette.
ECL based gate arrays.  10 ns clock cycle.
Instruction set compatible to a Cray X-MP/24.
Native Unix port (Unicos was not on X-MPs at the time).  Defintely not COS.
Compiler: ? Either cft 1.13 (ad later) or native.
Main architect: Mike Flynn, Stanford U.

Peak employees: ~100
Primary competition: SCS [and Supertek by default]
Secondary competition: Cydrome
Disposition: Failed to get a second round of venture funding.
Failed ahead of its time.
See also Trilogy.

Design studies, financial forms, etc.
Some materials archived with The Computer Museum History Center.


Columbia Homogeneous parallel processor.

1970s era by H. Sullivan.
Prototype made?  SAIC?

Supercomputer Systems Inc. (2)

San Diego, CA
Very little is know about this firm.

Half alive companies  (software, services, different products only)

CDC => CDS => Syntegra
	This section will be added later

CDC 6600 sites
Serial        Customer                               Location
------        -------------------------------------- ---------------
1             Lawrence Livermore Labs                (California)
2             Boeing                                 (Seattle)
3             CERN                                   (Geneva)
4             New York University                    (NY)
5             Weather Bureau                         (Washington DC)
6             Kirkland AFB                           (Albuquerque)
7             NCAR                                   (Boulder)
8             Bettis Labs			     (Pittsburgh)
9             Berkeley                               (California)
10            KAPL [Knolls Atomic Power Laboratory]  (Schenectady)
11            Brookhaven                             (Long Island)
12            Mobil Oil Company                      (Dallas)
13            Lawrence Livermore Labs                (California)
14            LASL                                   (Los Alamos)
15            Aerospace                              (Los Angeles)
16            University of Minnesota                (Minneapolis)
17            University of Texas                    [Austin?]
19            France
20            Westinghouse                           (Pittsburgh)
21            NASA Langley                           (Hampton)

> CDC 6600s:
> Serial        Customer                               Location
> ------        -------------------------------------- ---------------
> 1             Lawrence Livermore Labs                (California)
> 2             LA CDC Data Center -> Boeing           (Seattle)
> 3             CERN                                   (Geneva)
> 4             New York University                    (NY [?])
> 5             Weather Bureau                         (Washington DC)
> 6             Kirkland AFB                           (Albuquerque)
> 7             NCAR                                   (Boulder)
> 8             Bettis Labs                            (Pittsburgh)
> 9             Berkeley                               (California)
> 10            KAPL [Knolls Atomic Power Laboratory]  (Schenectady)
> 11            Brookhaven                             (Long Island)
> 12            Mobil Oil Company                      (Dallas)
> 13            Lawrence Livermore Labs                (California)
> 14            LASL                                   (Los Alamos)
> 15            Aerospace  [NASA?]                     (Los Angeles)
> 16            University of Minneasota               (Minneapolis)
> 17            University of Texas                    [Austin?]
> 18            NASA Langley                           (Norfolk)
> 19            French Power                           (France)
> 20            Westinghouse Telecomputer              (Pittsburgh)
> 21            NASA Langley                           (Norfolk)
> 25            Berkeley                               (California)
> 31            Livermore CA -> Sandia                 (Alburquerque)
> 42            Lawrence Livermore Labs                (California)
> CDC 6400s:
> Serial        Customer                               Location
> ------        -------------------------------------- ---------------
> 1             CDC Data Center Palo Alto              (California)
> 2             McDonnel-Douglas                       (St. Louis)
> 3             Sun Oil Company                        (Dallas)
> 4             Aachen University                      (Germany)
> 5             University of Adelaide                 (Australia)
> 6             Lockheed [-Martin?]                    (Marietta GA)
> 7             Aerospace [Lockheed]                   (El Segundo CA)
> 8             Smithsonian Institute                  (Boston)
> 9             NSA                                    [DC]
> 1  6700       NSA
> 10            Florida State University               (Tallehasse FL)
> 11            NASA Langley                           (Norfolk)
> 12            University of California Berkeley      (Berkeley) Campus
> 13            Martin-Marietta                        (Georgia)
> 14            [un-spec'd]                            [???]
> 15            [un-spec'd]                            [???]
> 16            AF Martin Orlando                      (Orlando FL)
> I'm sure you aware that there were lots of machines made with no
> serial numbers.  With lots of them going to the NSA and other points
> unknown.  These were the Dept of Forestry machines.

CDC 6400s:

Serial        Customer                               Location
------        -------------------------------------- ---------------
1             Palo Alto [???]                        (California)
2             McDonnell                              (St. Louis)
3             Sun Oil Company                        [???]
4             Aachen   [univ?]                       (Germany)
5             University of Adelaide                 (Australia)
6             Lockheed                               [???]
7             Aerospace [Lockheed]                   [???]
8             Smithsonian                            [DC?]
9             Classified
10            Florida State University               [???]
11            NASA Langley                           [Hampton]
12            University of California               [???]
13            Martin-Marietta                        [???]
14            [un-spec'd]                            [???]
15            [un-spec'd]                            [???]
16            Martin Orlando                         [Florida?]
??		Purdue
??		TU Berlin

??		Purdue

ICL / AMT DAP (not totally dead)  Cambridge Parallel Processing Systems (CPPS)

ICL, sometimes called the IBM of England.
Owned by Fujutsu since approx 1990.

510 model in Computer History Musem

Also add Meiko a UK based company that achieved some success ( LLNL ) with
an MPP machine after seeding a "computing surface" to some UK achademic sites.

Sometimes considered competition to the Goodyear/Loral MPP
English SIMD machine, a.k.a. Active Memory Technology (Irvine Offices)

Inmos Transputers

Not a supercomputer per se, but an interesting attempt at a component
with real concern for I/O.
        Popular processor (1982) in some circles.
        Well thought out communication scheme, but problems with scalability.
        Crippled because it's lacking US popularity.

|However, you must mention Transputers (something developed in EUROPE,
|outside of the U.S.A., name comes from TRANSistor and comPUTER,
Transistor of future Computing) and the |related companies:
|* INMOS (from GB), now bought by SGS Thompson (French), who was the
|inventor and sole manufacturer of transputers
|* Parsytec (still alive, but does not use Transputers any more, Germany)
|* Meiko (GB) produced the "computing surface"
|* IBM had an internal project (codenamed VICTOR)
|and there are many more. Transputers had a built in communication agent and
|it was very easy to connect them together to form large message passing
|machines in regular, fixed topologies. INMOS' idea was that they should be
|the "transistors" (building blocks) of the new parallel computers.

The Inmos transputer has earned a place in this file, now that
SGS/Thomson has issued the last-time-buy warning (end 98, last
deliveries end 99). The moral of this one is don't try to change
everything at once (language, processing model, hardware)

> > Basically, they got trounced by Intel.  Lack of software, too.
> > Could not keep up with the technology curve.  Some shortsightedness
> > on the part of Americans.  Certain proprietary things about the
> architecture.
> > Processors alone are not enough.
> T800 was a good processor. It was a contemporary of i386, and I have
> gotten better performance on it than on a 386. However, 486 DX, which
> came immediately after, was much faster, and the next generation
> transputer (T9000) got delayed. I think inmos was taken over or
> something by SGS Thompson (from what I remember, its been years since
> then).

I was moderator in those days and I was also closely allied with
Floating Point Systems, who used transputers in their T-series
machines. My students and I even wrote a C-compiler for the T. Donna
Bergmark and her collaborators at Cornell did a lot of looking at
languages for these things. I've also been in contact with many other
people in the scientific community of these hot-rod machines.

I think I can say with a lot of certainty that no matter how hot the
hardware, without imperative style compilers available, those machines
are pretty much doomed. That is a reality; sorry if your a computer
engineer and think otherwise...


You may recall that I worked on Occam with David May at INMOS.  In the order
of things I played Slartybartfast to David's Paranoid Android. I did the
crinkly bits around the edges of the Fiords.

Occam was an imperative language which had communication (channels) and
concurrency as first order objects in the language. At the time of the
language development we had little understanding of the semiotics involved
in providing a language of this kind, but since it was the first of its kind
I certainly learnt what I consider to be important lessons from it
(later manifest in my doctoral thesis). Since then semiotics of one kind or
another has dominated my own work. This work is online at: 


This may have some historical interest since it records technical anecdotes
of the time related to Occam and Linda. It is written in English - despite
the French summary at the beginning that may lead you to believe otherwise.

Occam was based on Tony Hoare's CSP, in many ways it was a partial
implementation of CSP.

I view Occam as a first attempt. CSP, however, is far from a dead computer
architecture and often liken it to Euclid. If we ignore CSP we are doomed
only to reinvent it. However, translating this fundamental geometry of
parallel computing into usable engineering tools is a problem yet to be

The parallel processing industry died a death at the end of 1993 as KSR
Thinking Machines went out of business. Some of that industry returned
to academia. I, along with many others, transitioned our skills into what
became mainstream distributed computing. I, at least, have been awaiting
a resurgence of interest in parallel processing.

Comp.parallel postings that I made over a decade ago have more relevant
historical detail:


SIMD machines in general

Thinking Machines Corp.

Thinking Machines was founded by Danny Hillis to architect the concept
of massively parallel (SIMD) computers.  TMC sold over 100 systems
called "Connection Machines" between 1989 and 1996.
  CM-2  up to 65,536 single-bit computers with FP accelerator
  CM-5  up to 1024 32-bit (SPARC) computers with vector accelerator.
They went out of the computer business in 1996 and are still alive (barely)
making data mining software.

Hardware/services was sold to the Gores Technology Corp., the same which
is sniffing around SGI's Cray division.

The data mining software group that remained were bought by Oracle June 7, 1999.

CMSI and has a web page at www.connmach.com.

Only six sites?
Special projects.
Only site with two CMs: LANL?  (ending with a CM-5 [largest # of nodes])

MasPar (defunct)

Acquired by Accure Software (a data mining software company (NeoVista)).

SIMD mini-Connection Machine-1 also resold by DEC minus the lights and
the black cabinet.

Examples in the Computer History Museum.


Homebase was Waltham, Ma, USA
   It was a non-descript, plain red brick building at the end
   of a long driveway past other office buildings.  There was
   _no_ identifying signage, and no indication which door was
   the front door.
Formed by some academic types from Cambridge, first office
was actually on Kendall Square, hence the name.
Strength:  AllCache
The goal is to have a logically shared memory in a scalable
So you connect your processors, with their caches, to main memory.
   What does main memory do?
   1. It gives you a bottleneck, and
   2. It provides the value which any datum is assumed to have, and
   3. It doubles the memory costs of your computer.

Thus, if you can figure out how to do (2.), you can eliminate
(1.) & (3.).
The AllCache Engine solves (2.) as follows:
   Connect the processors together using a high-speed, unidirectional
   ring to give high bandwidth and allow all processor caches to stay
   coherent.  The size of the ring was 34 = 32 + 2 nodes.  Use 32 of
   the nodes for processors, and the other two for linking to
   other rings.  Configure the rings in a hierarchial fashion using
   32 processor rings as the base level, rings connecting to rings
   with 32 processors as the next higher level, rings connected to those
   rings as the next higher level, etc.  Tell yourself that "data
   locality" means you'll rarely have a memory access go thru the
   higher levels of the rings.  Viola- scalable shared memory.
AllCache level 0 is a ring of (up to) 32 processors.
AllCache level 1 is a ring whose nodes connect to
   the rings with 32 processors.  Any KSR1 with more than 32 processors
   had level 1.  Max is 32 * 34 = 1088
AllCache level >1 was never built, but was allowed by the architecture
AllCache moved cache lines, which were 128 bytes.
(Not to be confused with subcache lines which were 64 bytes.)
Subcache size: 512 KB
Cache size: 32 MB  (i.e., "per processor")
Level 0 AllCache:  1 GB ( 32 processors)
Level 1 AllCache:  34 GB ( 34 * Level 0)

KSR1 - 20 MHz processors, largest ever built was installed
       at BRL and was 384 processors.
       Sites included CTC, ORNL, GT, NCSC, UMi, UFl & a few more.
KSR2 - 40 MHz processors --> but same speed as KSR1 memory system !?!?
       (Some of my sales friends say it worked.  None of my sysadmin
       friends ever said it worked.)
KSR3 - same as KSR1 & KSR2, but would use IBM PPC processors.
Weakness: Implementation
KSR made their own processors-  20 MHz w/ fused fadd/fmul
instruction gave the luminal speed of 40 Mflops per.
Two instruction streams- arithmetic & memory.  IEEE fp.
It was a 64 bit processor w/ 64 bit addresses in 1991.  It had
no speculative execution or branch prediction, etc.
I/O ran thru the processor, and worked by "cycle stealing"-
When the I/O subsystem wanted the processor to do something,
it would stop instruction issue, and insert its own
instructions in the memory-op instruction stream.
AllCache latencies were approximate (_no_ memory time on the
KSR1 was determinate, all were averages, too many microstates for the
same macrostate)
   data item is somewhere in subcache - 2 clocks
   data item is somewhere in cache - 20 clocks
   data item is somewhere in your level 0 AllCache - 50 clocks
   data item is somewhere in your level 1 AllCache - 150 clocks
   subcache was two-way set associative with random replacement.
   cache was 16-way set associative with random replacement,
      but 4 of the 16 were tied down by the OS.
   The processor didn't have a scoreboard, and nobody really
   knew just exactly where, at any time, a data item might be
   located, so a subcache miss stalled the processor for _at
   least_ 10's of clocks.
The bottom line was that the KSR1 was a difficult beast to
program *for high efficiency*.  The programmer had to
keep in mind which subcache line a data item would use, which
cache line a data item would use, all the while trying to
make (typically vector) code have behavior resembling cache re-use.
   One thing which was supposed to help out was an instruction
   called "prefetch" which could move a data item to where it
   was needed prior to the actual data request.  In Fortran,
   a prefetch looked like a function call (which the compiler
   would silently ignore).  It didn't work in general, and who
   wants to code prefetches?  Why not just go with message passing?
Neatest thing: Free lunches in house.  This saved the company
   a lot of time employees didn't spend driving to restaurants,
   talking shop where others could overhear, getting stuck in
   traffic, etc.  It was a good meal (and I'm a fussy eater).
Funniest thing:  where other startups had newspaper clippings
   on the walls describing some Victory the company achieved,
   at KSR the most popular clipping was a gag article some local
   business reporter wrote on How Fast Startup CEO's Drove
   Their Fancy Sportscars on Rt128.  Henry claimed 138 mph (speed
   limit 55 mph).
Q:  If AllCache was such a good idea, why did KSR die?
A:  They were caught inflating the revenue the company had
    actually received.  They were sued by the stockholders,
    who were paid off largely in stock.  The day after the
    court finalized the settlement, the company declared
    bankruptcy (the capital didn't give any more to management).
Q:  Why was KSR so secretive?
A:  AllCache is a simple idea, and it's not clear the patents
    would be upheld in court (post Myrias, SCI).
Was laid off in true KSR style.  Found out when my login
didn't work anymore.

Lessons Learnt:
KSR:  Don't assume that having troubles in the Numerical World means
you're ready for the Transaction/Data Mining World (if you can't make an
NSF computer center happy, how are you going to make a bank happy?);
use, ahem, standard accounting practices, at least after you go public.

I was at Kuck and Associates at the time and owned the relationship with
Kendall Square.

The KSR AllCache architecture was a "floating" memory page system. That is,
all memory was distributed and "floated" to the location where it was
addressed. In this sense you have a large shared address space. The trick,
naturally, was the need to have strong locality. Without that locality the
machine would thrash pages across it's distributed memory subsystem.

The machine did have relatively good support from the KAI HPF compiler,
although someone had foolishly made performance promises that were
physically impossible to keep. The KSR architecture was untried and unproven
in a time of untried and unproven massively parallel computer architectures.
As in all the cases from this period we discovered, I think, that a
general purpose parallel computer was a very hard animal to consider.

Evans and Sutherland Computer Division ES-1

E&S is well-known in the computer graphics community as making some of the
finest high-performance real-time computer graphics hardware.  This
image generation hardware is used in $10-100M flight (and other) simulators.

When it was announced that E&S was getting into the supercomputer arena,
they were perceived a serious/credible new contender.
Gordon Bell, however, takes a dimmer view of them.
One representative machine is in storage at the Museum
(over Gordon's dead body).


Jean Yves LeClerk studied under Dave Evans and upon getting his degree,
went back to France.  When he got an idea of how to build
a supercomputer, he came back to Evans for advice on how to raise
capital to fund the project.  Evans said "I won't tell you how to
raise money- I'll fund you myself."  Thus, Evans & Sutherland got
a computer division, ESCD.  It was located in Mountain View, Ca, USA,
just off US 101.  Formed ca 1986, product shipped 1989, near midnight
at the end of the September (so shipped in the quarter promised).

Basic idea:
The building block was an 8 x 8 nonblocking crossbar, which could
also connect to another similar crossbar without using one of the 
8 x 8 connections.  Use two crossbars to connect 16 processors
to a memory system with 16 banks.  Virtual memory, with translation
done in the memory side of the crossbar (to allow faster context
switches).  The processor had a small TLB (512 ? page entries).

Use another 8 x 8 crossbar to connect 8 of those together, and you
have 128 processors in one system.  Note that this scheme may be
extended: 8 128 processor systems could be connected by another
8 x 8 crossbar, etc.  Great for data-parallel, too bad there wasn't
any HPF in 1988 :-(  (ESCD played around with PCF, IIRC)  The
system was a (theoretically) scalable shared memory NUMA computer.

ESCD had a unique nomenclature: the processors were called 
"computational units", and the set of 16 computational units and memory
was called a "processor".  Memory was 128 MB per "processor".

Use MACH for your OS, so you'll have a "parallel" unix.  You need
the parallel file system to drive the very good I/O subsystem,
which was rated at 200 MB/s per processor (1.6 GB/s per full system).

Neatly finessing the issue of custom v. off-the-shelf processor,
ESCD made their own processor, but used Weitek chips for the
floating point. (This was back in the days when a processor was
a "chip set", rather than the single die for sale today.)  During
development, the clock was 40 MHz, with plans to go to 50 Mhz by the
time of production.  32-bit words, but the Weitek's would
do 64-bit fp nicely.  Luminal speed was about 12 Mflops per
computatiuonal unit.  Measured speeds (i.e., operands in memory rather
than register) were more like 2-3 Mflops per.

There were some unexpected problems with the pipelines in the
processors: certain instructions couldn't be issued at particular clocks
after issue of certian other instructions.  The French called this
"pipeline hazards", the Californians called it "cursed instruction sequences".
It was a closely guarded secret, and caused ESCD to
not release the instruction set, nor the assembler.

Biggest problem with the design was memory access:
processor to memory: return the physical address of this virtual address
memory to processor: here's the paddress
processor to memory: read/write data to/from this physical address
memory to processor/processor to memory: here's the data

So a memory op was actually 4 messages rather than 2 anytime the
physical address wasn't in the processor's small TLB.  Think Linpack.
512 pages just ain't supercomputin'

Serials 1 and 2 went to Caltech and U. Colorado at Boulder
(can't recall which got which).  Up to about serial 7 or so were in some
stage of production when the project ended.

The ES-1 at CU Boulder was installed right beside, and during the same
week as, the Myrias SPS-2.  Head to head competition.  (Myrias was
in-and-out in a couple of days.  ESCD needed a couple of weeks.)

Neatest thing:  Being within walking distance of Shoreline City Park,
		so a walk would clear the head of the frustrations
of working on beta (alpha ?) HW & SW.

Funniest thing:  Culturally, ESCD couldn't take a meal together, because
                 The French wouldn't eat without wine,
                 Which the Mormons wouldn't touch.
                 The Californians acted well-fed after mass
                 quantities of pizza and beer,
                 While the Sales team was out looking for a 3+ star
                 restaurant to put on their expense reports.

Mort d'ES-1
The project ended when Evans resigned from the Board of E&S.
Then a 4/3 vote in favor of the project became a 3/4 vote against.
We got 60 days notice (U.S. plant closing law), it was announced
at the Supercomputing convention in Reno.  (If you have to end
my project, end it at the Supercomputing convention.  Most of the
places I might look for work are within walking distance.  For
the record, Henry Cornelius was the first headhunter to our booth
after the announcement, within 3 or 4 minutes.)

Lessons Learnt:
ESCD:  Don't push too many technologies simultaneously.
Chips checked out on the silicon compiler, but were too dense (100K
transistors in 1986) for the foundries to make at acceptable yield;
MACH was not ready for commercial use in 1988.

End ES-1

[ My thanks to the moderator for allowing me to contribute my
reminiscences.  Next time I work for another startup with another
Great Idea, I'll take better notes... (I didn't know there was
going to be a test ;-) ]

An ES-1 is found at The Computer Museum History Center, Moffett Field, CA.

Astronautics, Wisconsin

ZS-1: no vectors.

Unconfirmed report that the single implementation of Jim Smith's ZS-1 is in 
Rhode Island in a museum.


Colorado Springs.

Convex software.

Vitesse Electronics was a startup to do two things:
GaAs chips (which survives as Vitesse Semiconductor),
and a mini-supercomputer which never made it.

The architecture of the Vitesse Numerical Processor (VNP) used
a very deep pipeline, and attempted to bypass the latency
problems that arose from the deep pipeline with a so-called
data-driven optimizer (DADO).  The machine did not have
registers, but used three-address instructions directly
involving memory.  The DADO kept track of the interdependencies
among source and destination addresses, and issued instructions
when it could.  The intent was to allow enough instructions
in the DADO to cover two (or perhaps even more) data
dependencies in tight loops without needing to stall the pipeline.

The processor did not have any IO, but relied on a
front-end to do all of the work, and run most of the
UNIX operating system.  System calls from the backend were
relayed to the front-end.  The initial machine was intended
to be CMOS, with a view towards a later implementation in GaAs.

The machine was intended as an MP machine, and had a very
interesting interconnect.  SW was used to establish mappings
from a local processor's address space to so-called global
virtual addresses.  Similarly, global virtual addresses
could be mapped to local addresses.  The net effect was that
the SW could establish a form of "carbon-copy" memory.  Writes
from one CPU to a local address would also show up, through the
mappings, in a local address in one or more other processors.
The mappings could be, but need not be, symmetric.

The machine was designed far enough to have an assembler,
a compiler, and an OS that booted, and even ran a [trivial]
job in simulation, but the key chips were never fabbed.

The Applied Dynamics AD100, and ECL-based multiprocessors
with 65-bit (yes, 65, not 64) floating point, did 20 MFlops in 1981.  There are
a couple of hundred installations, or more, the majority of which are in
California.  The company was a University of Michigan Aerospace engineering
department spinoff located in Ann Arbor, Michigan, and founded by three UM
profs.  Their focus was/is on real time applications, their system had lots
of special hardware to interface to r/t equipment.  The company still exists,
although they are not selling many of these expensive machines any more, and
they have a web site at http://www.adi.com.  It had a minimal operating
system, and in addition to Fortran supported their in-house parallel
simulation language (ADSIM) derived from CSSL, for systems of odes.

Q Is it true that supercomputer programmers spend 
  their nights in flophouses?
A Only when coming up on a deadline.
Stan Lass


Herman Lukoff's (now out of print) book "From Dits to Bits"
LARC (2 sites: LLL and Navy):
        -  four microsecond add instruction compared with 525
           microseconds on the Univac I
        -  separate programmed I/O processor
        -  provision for a second computation processor if needed
        -  4-deep instruction pipelining
        -  26 fast general purpose registers
        -  eight main storage units (2500 - 60 bit words each)
           independently addressable for concurrent CPU and I/O access
        -  up to 12 fast drums (500,000 characters per sec transfer)



Articles: comp.parallel
Administrative: eugene@cse.ucsc.edu.SNIP
Archive: http://groups.google.com/groups?hl=en&group=comp.parallel

eugene (428)
6/26/2003 1:02:59 PM
comp.parallel 2866 articles. 0 followers. Post Follow

0 Replies

Similar Articles

[PageSpeed] 32