Re: [9fans] Plan9 - the next 20 years

  • Follow


Well, in the octopus you have a fixed part, the pc, but all other =20
machines come and go. The feeling is very much that your stuff is in =20
the cloud.

I mean, not everything has to be dynamic.

El 17/04/2009, a las 22:17, ericvh@gmail.com escribi=C3=B3:

> On Fri, Apr 17, 2009 at 2:43 PM, <tlaronde@polynum.com> wrote:
>> On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
>>> I cannot find the reference (sorry), but I read an interview with =20=

>>> Ken
>>> (Thompson) a while ago.
>>>
>>
>> My interpretation of cloud computing is precisely the split done by
>> plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, =20=

>> with
>> actual computing done somewhere about data stored somewhere.
>>
>
> That misses the dynamic nature which clouds could enable -- something
> we lack as well with our hardcoded /lib/ndb files -- there is no
> provisions for cluster resources coming and going (or failing) and no
> control facilities given for provisioning (or deprovisioning) those
> resources in a dynamic fashion. Lucho's kvmfs (and to a certain
> extent xcpu) seem like steps in the right direction -- but IMHO more
> fundamental changes need to occur in the way we think about things. I
> believe the file system interfaces While not focused on "cloud
> computing" in particular, the work we are doing under HARE aims to
> explore these directions further (both in the context of Plan
> 9/Inferno as well as broader themes involving other platforms).
>
> For hints/ideas/whatnot you can check the current pubs (more coming
> soon): http://www.research.ibm.com/hare
>
> -eric
>
> [/mail/box/nemo/msgs/200904/38399]

0
Reply nemo (674) 4/17/2009 10:11:28 PM

if you want to look at checkpointing, it's worth going back to look at
Condor, because they made it really work. There are a few interesting
issues that you need to get right. You can't make it 50% of the way
there; that's not useful. You have to hit all the bits -- open /tmp
files, sockets, all of it. It's easy to get about 90% of it but the
last bits are a real headache. Nothing that's come along since has
really done the job (although various efforts claim to, you have to
read the fine print).

ron

0
Reply rminnich (1317) 4/17/2009 10:18:03 PM


On Fri, Apr 17, 2009 at 03:15:25PM -0700, ron minnich wrote:
> if you want to look at checkpointing, it's worth going back to look at
> Condor, because they made it really work. There are a few interesting
> issues that you need to get right. You can't make it 50% of the way
> there; that's not useful. You have to hit all the bits -- open /tmp
> files, sockets, all of it. It's easy to get about 90% of it but the
> last bits are a real headache. Nothing that's come along since has
> really done the job (although various efforts claim to, you have to
> read the fine print).

My only knowledge about this area is through papers and books so very
abstract.

But my gut feeling, after reading about Mach or reading A. Tanenbaum
(that I find poor---but he is A. Tanenbaum, I'm only T. Laronde),
is that a cluster is above the OS (a collection of CPUs), but a
NUMA is for the OS an atom, i.e. is below the OS, a kind of
"processor", a single CPU (so NUMA without a strong hardware specifity
is something I don't understand).

In all the mathematical or computer work I have done, defining the
element, the atom (that is the unit I don't have to know or to deal with
what is inside) has always given the best results.

Not related to what you wrote but the impression made by what can be
read about this "cloud computing" in the open sewer:

A NUMA made of totally heterogeneous hardware with users plugging or
unplugging a CPU component at will. Or a "start-up" (end-down) providing
"cloud computing" with as the only means the users' hardware connected
is perhaps a WEB 3.0 or more, a 4th millenium idea etc. but is for me at
best an error, at worst a swindle.
-- 
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

0
Reply tlaronde (278) 4/18/2009 12:02:40 PM

[I reply to myself because I was replying half on two distinct threads]

On Sat, Apr 18, 2009 at 01:59:03PM +0200, tlaronde@polynum.com wrote:
> 
> But my gut feeling, after reading about Mach or reading A. Tanenbaum
> (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde),
> is that a cluster is above the OS (a collection of CPUs), but a
> NUMA is for the OS an atom, i.e. is below the OS, a kind of
> "processor", a single CPU (so NUMA without a strong hardware specifity
> is something I don't understand).
> 
> In all the mathematical or computer work I have done, defining the
> element, the atom (that is the unit I don't have to know or to deal with
> what is inside) has always given the best results.

The link between this and the process migration is that, IMHO or in my
limited mind, one allocates, depending on resources available at the
moment, once and for the process duration, a node.  This is OS business
: allocating resources from a cluster of CPUs.

The task doesn't migrate between nodes, it can migrate "inside"
the node, from core to core in a tightly memory space coupled CPU 
(a mainframe, whether NUMA or not) that handles failover etc. But
this is infra-OS, "hardware" stuff and as far as the OS is concerned
nothing has changed since the node is an unit, an atom. And trying
to solve the problem by breaking the border (going inside the atom)
is something I don't feel.

-- 
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

0
Reply tlaronde (278) 4/18/2009 2:35:50 PM

On Sat, Apr 18, 2009 at 08:05:50AM -0700, ron minnich wrote:
> 
> For cluster work that was done in the OS, see any clustermatic
> publication from minnich, hendriks, or watson, ca. 2000-2005.

Will do.
-- 
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

0
Reply tlaronde (278) 4/18/2009 3:49:09 PM

> Well, in the octopus you have a fixed part, the pc, but all other  
> machines come and go. The feeling is very much that your stuff is in  
> the cloud.

i was going to mention this.  to me the current view of cloud
computing as evidence by papers like this[1] are basically hardware
infrastructure capable of running vm pools each of which would do
exactly what a dedicated server would do.  the main benefits being low
administration cost and elasticity.  networking, authentication and
authorization remain as they are now.  they are still not addressing
what octopus and rangboom are trying to address: how to seamlessly and
automatically make resources accessible.  if you read what ken said it
appears to be this view of cloud computing; he said "some framework to
allow many loosely-coupled Plan9 systems to emulate a single system
that would be larger and more reliable".  in all virtualization
systems i've seen the vm has to be smaller than the environment it
runs on.  if vmware or xen were ever to give you a vm that was larger
than any given real machine it ran on, they'd have to solve the same
problem.

[1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf


0
Reply 9nut 4/19/2009 7:16:34 AM

* Latchesar Ionkov <lucho@ionkov.net> wrote:

Hi,

> I talked with a guy that's is doing parallel filesystem work, and
> according to him 80% of all filesystem operations when running an HPC
> job are for checkpointing (not that much restart). I just don't see
> how checkpointing can scale knowing how bad the parallel fs are.

We need a clustered venti and an cluster-aware fossil ;-P

I'm currently in the process of designing an clustered storage, 
inspired by venti and git, which also supports removing files,
on-demand sychronization, etc. I'll let you know when I've 
got something to present.


cu
-- 
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

0
Reply weigelt (143) 4/19/2009 7:42:33 PM

6 Replies
13 Views

(page loaded in 0.13 seconds)


Reply: