Gradual Learning, not Reinforcement Learning

  • Permalink
  • submit to reddit
  • Email
  • Follow


I think that the attempt to prove an a priori assumption about the
efficacy of 'reinforcement' in AI is not wise unless someone has
made the decision that he wants to spend his time researching
'reinforcement' in computational learning theories.  By shifting to
a more generic concept, like my idea of 'gradual learning,' the AI
researcher can rid himself of the burden of a doxology of concept that
was not designed with his interests in mind.

I feel that people who insist on using reinforcement even when it does
not work for them are just creating a problem where one does not need
to exist.  If an intense study of 'reinforcement' is what someone
enjoys or otherwise wishes to pursue, that is his decision and I would
be interested in his results.  But if that is not an important sub-goal
then why bother schlepping someone else's dogma even when it does not
work for you?  Sure, study the problem if you want, but if your real
interest is in finding solutions, then use the solutions that you can
find.

We need a priori beliefs.  But we also need to examine them carefully
and to accept the results of our studies and experiments.  When you see
an alternative that could act as a viable solution to a problem that
you are working on I think you should give it some consideration even
if it doesn't fit in with your preconceived theories.

Although I have certain opinions about what I call gradual learning,
the real issue that I am stressing in this message is that gradual
learning does not have to be constrained in the same ways that
reinforcement learning probably should be.  With the freer concept of
gradual learning, you can examine possibilities that would be
dogmatically rejected under the epistemology of an artificial
implementation of operant conditioning and reinforcement (what ever
that might be).

It is the nature and modeling of the relations between the references
of the objects of knowledge that is key to the contemporary problem.
None of the paradigms of the past have been shown to be capable of
fully solving this problem.  Something new has to be explored.

Jim Bromer

0
Reply jbromer712 (7) 7/14/2006 6:18:18 PM

See related articles to this posting


Jim Bromer wrote:
> I think that the attempt to prove an a priori assumption about the
> efficacy of 'reinforcement' in AI is not wise unless someone has
> made the decision that he wants to spend his time researching
> 'reinforcement' in computational learning theories.  By shifting to
> a more generic concept, like my idea of 'gradual learning,' the AI
> researcher can rid himself of the burden of a doxology of concept that
> was not designed with his interests in mind.
>


There are at least 50 theories of learning, so there is no need for
anyone interested in AI, per se, to get too imprinted on any specific
one .....

http://tip.psychology.org/

One size fits all only if you're a polyester sock.


>
> I feel that people who insist on using reinforcement even when it does
> not work for them are just creating a problem where one does not need
> to exist.  If an intense study of 'reinforcement' is what someone
> enjoys or otherwise wishes to pursue, that is his decision and I would
> be interested in his results.  But if that is not an important sub-goal
> then why bother schlepping someone else's dogma even when it does not
> work for you?  Sure, study the problem if you want, but if your real
> interest is in finding solutions, then use the solutions that you can
> find.
>
> We need a priori beliefs.  But we also need to examine them carefully
> and to accept the results of our studies and experiments.  When you see
> an alternative that could act as a viable solution to a problem that
> you are working on I think you should give it some consideration even
> if it doesn't fit in with your preconceived theories.
>
> Although I have certain opinions about what I call gradual learning,
> the real issue that I am stressing in this message is that gradual
> learning does not have to be constrained in the same ways that
> reinforcement learning probably should be.  With the freer concept of
> gradual learning, you can examine possibilities that would be
> dogmatically rejected under the epistemology of an artificial
> implementation of operant conditioning and reinforcement (what ever
> that might be).
>
> It is the nature and modeling of the relations between the references
> of the objects of knowledge that is key to the contemporary problem.
> None of the paradigms of the past have been shown to be capable of
> fully solving this problem.
>


Naiive learning devices certainly haven't worked.


>Something new has to be explored.
>

0
Reply feedbackdroid 7/14/2006 8:03:01 PM

JB: I think that the attempt to prove an a priori assumption about the
efficacy of 'reinforcement' in AI is not wise unless someone has
made the decision that he wants to spend his time researching
'reinforcement' in computational learning theories.



GS: The argument, which you have not dealt with, is that the behavioral 
phenomena that are referred to broadly as habituation (and sensitization, 
for that matter), classical conditioning, and (especially) operant 
conditioning explain, along with behavior that is largely inherited, all of 
animal and human behavior, at least at the behavioral level. Thus, if we 
wish to simulate behavior (even if only the parts that this guy or that guy 
define as "intelligent") we would do well to understand these phenomena and 
to speculate upon, and look towards physiology to explain, how these 
processes are "implemented" by physiology. [Part of "speculating upon" is 
attempting simple models that can be tested by computer.] In any event, it 
seems to be there are only two other possibilities. The first is that the 
assertion is wrong on either logical/conceptual grounds or on empirical 
grounds, and that we must, therefore, add more principles. The second is the 
bird/plane argument. That is, that "artificial intelligence" can be achieved 
in ways that don't have anything to do with "how nature does it."



JB: By shifting to
a more generic concept, like my idea of 'gradual learning,' the AI
researcher can rid himself of the burden of a doxology of concept that
was not designed with his interests in mind.



GS: As I recall, the last time you tried to explain "gradual learning" you 
offered gibberish containing controversial terms defined in terms of other 
controversial terms. My current question is, thus, what is "gradual 
 learning" and do you see it as some process that operates in real animals?

JB: I feel that people who insist on using reinforcement even when it does
not work for them are just creating a problem where one does not need
to exist.



GS: How do you tell the difference between this view and the view that the 
processes in question are simply extremely complex?



JB: If an intense study of 'reinforcement' is what someone
enjoys or otherwise wishes to pursue, that is his decision and I would
be interested in his results.  But if that is not an important sub-goal
then why bother schlepping someone else's dogma even when it does not
work for you?  Sure, study the problem if you want, but if your real
interest is in finding solutions, then use the solutions that you can
find.



GS: How does this fit in with the issue that I have outlined?

JB: We need a priori beliefs.  But we also need to examine them carefully
and to accept the results of our studies and experiments.  When you see
an alternative that could act as a viable solution to a problem that
you are working on I think you should give it some consideration even
if it doesn't fit in with your preconceived theories.



GS: How does this fit in with the issue that I have outlined? Is this an 
argument about processes that actually exist in animals or is this a 
"bird/plane" type argument?

JB: Although I have certain opinions about what I call gradual learning,
the real issue that I am stressing in this message is that gradual
learning does not have to be constrained in the same ways that
reinforcement learning probably should be.



GS: Like "pixies," "poltergeists," and "God" are not constrained by the 
findings of empirical science?



JB: With the freer concept of
gradual learning, you can examine possibilities that would be
dogmatically rejected under the epistemology of an artificial
implementation of operant conditioning and reinforcement (what ever
that might be).



GS: Weeeeeee! I'm a little pixie! Weeeeeeeeee!

JB: It is the nature and modeling of the relations between the references
of the objects of knowledge that is key to the contemporary problem.



GS: But the question is "What is reference?" "What is knowledge?" And these 
questions may well be the same sort as "What is the life-force?" Who is the 
peddler of dogma?



JB: None of the paradigms of the past have been shown to be capable of
fully solving this problem.  Something new has to be explored.



GS: Like what? Oh yeah, "knowledge," "reference," and what "gradual 
 learning"? What about little pixies, and the life force?

"Jim Bromer" <jbromer@isp.com> wrote in message 
news:1152901098.800566.43210@i42g2000cwa.googlegroups.com...
>I think that the attempt to prove an a priori assumption about the
> efficacy of 'reinforcement' in AI is not wise unless 


0
Reply Glen 7/14/2006 9:39:39 PM

feedbackdroid wrote:

>
> There are at least 50 theories of learning, so there is no need for
> anyone interested in AI, per se, to get too imprinted on any specific
> one .....
>
> http://tip.psychology.org/
>
> One size fits all only if you're a polyester sock.
>
....
>
> Naiive learning devices certainly haven't worked.
>
>
Thanks for your comments.  I did not mean to sound as negative as I did
by the way.  I am not against reinforcement learning theories, but I
think as you seem to think that there are a lot of other good theories
and a lot of good variations that can be used effectively in various
situations.  One of the things that makes reinforcement theory
interesting is that complex configurations can be shaped through simple
reinforcements of different configurations of input.  This is
interesting and it deserves some thought, but there are variations upon
variations that can be considered within a broader view of this one
kind of configuration learning (to coin a name.) Since a response to
this kind of thing may also be seen in the terms of configurations of
responses this means that the possibilities within this one relatively
narrow field are so mind-boggling that I really have to wonder why
anyone would accept any less. Anyone who is interested in learning
theory should at the very least take a look at what's around.

Jim Bromer

0
Reply Jim 7/14/2006 9:56:19 PM

"Jim Bromer" <jbromer@isp.com> wrote:
> I think that the attempt to prove an a priori assumption about the
> efficacy of 'reinforcement' in AI is not wise unless someone has
> made the decision that he wants to spend his time researching
> 'reinforcement' in computational learning theories.  By shifting to
> a more generic concept, like my idea of 'gradual learning,' the AI
> researcher can rid himself of the burden of a doxology of concept that
> was not designed with his interests in mind.
>
> I feel that people who insist on using reinforcement even when it does
> not work for them are just creating a problem where one does not need
> to exist.  If an intense study of 'reinforcement' is what someone
> enjoys or otherwise wishes to pursue, that is his decision and I would
> be interested in his results.  But if that is not an important sub-goal
> then why bother schlepping someone else's dogma even when it does not
> work for you?  Sure, study the problem if you want, but if your real
> interest is in finding solutions, then use the solutions that you can
> find.
>
> We need a priori beliefs.  But we also need to examine them carefully
> and to accept the results of our studies and experiments.  When you see
> an alternative that could act as a viable solution to a problem that
> you are working on I think you should give it some consideration even
> if it doesn't fit in with your preconceived theories.
>
> Although I have certain opinions about what I call gradual learning,
> the real issue that I am stressing in this message is that gradual
> learning does not have to be constrained in the same ways that
> reinforcement learning probably should be.  With the freer concept of
> gradual learning, you can examine possibilities that would be
> dogmatically rejected under the epistemology of an artificial
> implementation of operant conditioning and reinforcement (what ever
> that might be).
>
> It is the nature and modeling of the relations between the references
> of the objects of knowledge that is key to the contemporary problem.
> None of the paradigms of the past have been shown to be capable of
> fully solving this problem.  Something new has to be explored.

Well, the "nothing else worked let's try something different" approach has
been the driving force of AI for 50 years now.  It's caused a lot of people
to spend a lot of time walking in circles.  Progress is made, but it's slow
when you don't know where you are going.

If you think the direction you are walking (towards this concept of
"gradual learning"), is a viable alternative, then you should keep walking.
That's all any of us can do.

I as you probably know I am a big supporter of reinforcement learning.  I
believe in it because it's not an a priori assumption.  It's a fact of both
human and animal behavior proven by decades of scientific research.
There's plenty of room to debate what else might be there, and how
important a role reinforcement learning plays in the big picture of full
human behavior, but reinforcement learning isn't a guess, or an assumption,
it's a well documented fact.  It's something which all theories of human
intelligence must explain and, in the end, demonstrate.

The question to be answered, is what else is there?

Above you wrote:

> With the freer concept of
> gradual learning, you can examine possibilities that would be
> dogmatically rejected under the epistemology of an artificial
> implementation of operant conditioning and reinforcement (what ever
> that might be).

This implies you have some concepts that you sense are important to
creating intelligence but which are in direct conflict with the idea of
reinforcement learning.  Care to share? I'm quite sure you can't name any
aspect of human behavior which I can't explain in terms compatible with
reinforcement.  Most people that fight the idea of reinforcement show a
considerable lack of understanding of the subject and what they feel isn't,
or can't be, answered by a framework of reinforcement learning always can
be.

I believe there is always room for a fresh understanding of human behavior
created by a new approach.  But any new approach will have to explain all
the same data, that Behaviorism (for one) has already taken great pains to
explain.  There is a lot of important data which we don't have about human
behavior, so there is endless room to speculate about what that data might
look like if it were collected and what the cause of that would be.  But,
it's unwise to push forward with new theories that make no attempt to, or
worse yet - are unable to, explain the hard facts and data we do have.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/14/2006 10:29:25 PM

Curt Welch wrote:

.............
>
> Well, the "nothing else worked let's try something different" approach has
> been the driving force of AI for 50 years now.  It's caused a lot of people
> to spend a lot of time walking in circles.  Progress is made, but it's slow
> when you don't know where you are going.
>
..............
>
> I believe there is always room for a fresh understanding of human behavior
> created by a new approach.  But any new approach will have to explain all
> the same data, that Behaviorism (for one) has already taken great pains to
> explain.  ..................
>


Does the engineer doing AI in the first paragraph really care about
whether his AI can solve or otherwise perform the psychologist's
job, as written in the latter paragraph? Is AI supposed to be the
salvation of the psychologist?

0
Reply feedbackdroid 7/15/2006 12:38:43 AM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> Curt Welch wrote:
>
> ............
> >
> > Well, the "nothing else worked let's try something different" approach
> > has been the driving force of AI for 50 years now.  It's caused a lot
> > of people to spend a lot of time walking in circles.  Progress is made,
> > but it's slow when you don't know where you are going.
> >
> .............
> >
> > I believe there is always room for a fresh understanding of human
> > behavior created by a new approach.  But any new approach will have to
> > explain all the same data, that Behaviorism (for one) has already taken
> > great pains to explain.  ..................
> >
>
> Does the engineer doing AI in the first paragraph really care about
> whether his AI can solve or otherwise perform the psychologist's
> job, as written in the latter paragraph? Is AI supposed to be the
> salvation of the psychologist?

I'm not sure what you are asking.  But AI as I was talking about is a
reference to the job of trying to make a machine duplicate full human
behavior - not just the job of making machines perform interesting tasks
like playing chess.  If you don't duplicate the behavior which we already
know exists in humans, how could you possible believe you had duplicated
full human behavior?  That was the point of the second paragraph.  It had
nothing to do with doing the psychologists job - it has everything to do
with duplicating what psychology has shown us humans do.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/15/2006 1:44:27 AM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > Curt Welch wrote:
> >
> > ............
> > >
> > > Well, the "nothing else worked let's try something different" approach
> > > has been the driving force of AI for 50 years now.  It's caused a lot
> > > of people to spend a lot of time walking in circles.  Progress is made,
> > > but it's slow when you don't know where you are going.
> > >
> > .............
> > >
> > > I believe there is always room for a fresh understanding of human
> > > behavior created by a new approach.  But any new approach will have to
> > > explain all the same data, that Behaviorism (for one) has already taken
> > > great pains to explain.  ..................
> > >
> >
> > Does the engineer doing AI in the first paragraph really care about
> > whether his AI can solve or otherwise perform the psychologist's
> > job, as written in the latter paragraph? Is AI supposed to be the
> > salvation of the psychologist?
>
> I'm not sure what you are asking.  But AI as I was talking about is a
> reference to the job of trying to make a machine duplicate full human
> behavior - not just the job of making machines perform interesting tasks
> like playing chess.  If you don't duplicate the behavior which we already
> know exists in humans, how could you possible believe you had duplicated
> full human behavior?  That was the point of the second paragraph.  It had
> nothing to do with doing the psychologists job - it has everything to do
> with duplicating what psychology has shown us humans do.
>


Who cares, other than you and GS. I realize there's a tendency to mix
everything into the same line of thinking, but let the psychologists
worry
about "full" human behavior. Commander Data is scifi, not AI. I'd just
like
my robot/AI to get itself across the street in one piece, plus a few
other 
well-chosen tasks.

0
Reply feedbackdroid 7/15/2006 5:55:05 AM

Normally, I do not want to spend my time responding to people who make
derogatory comments about my comments as Glenn Sizemore did when he
said, "As I recall, the last time you tried to explain "gradual
learning" you offered gibberish containing controversial terms defined
in terms of other controversial terms."
The part of the statement where he said I used "controversial
terms," is certainly a reasonable criticism, but the part of the
statement where he said that I "offered gibberish," is unwarranted
and unsubstantiated.  For an example of gibberish Glenn you might take
a look at your own remarks where you said, "Weeeeeee! I'm a little
pixie! Weeeeeeeeee!"  I understood what you were saying there, and I
feel that some self-expression is a good thing, but isn't there some
irony here?  I also feel that Glenn used other corruptive forms of
argumentation in his comments and these tactics are not at all unusual
for Glenn.  However, I did make some combative remarks in my first
message so Glenn's excessively critical remarks probably were not
completely unwarranted this time.

My main criticism with Glenn is that he has almost never shown that he
is capable of criticizing his own views.  I think Glenn represents
himself as unwaveringly right in all his views.  Arguing with such a
person is difficult to say the least.

However, let me take a look at one thing Glenn said. "The argument,
which you have not dealt with, is that the behavioral phenomena that
are referred to broadly as habituation (and sensitization, for that
matter), classical conditioning, and (especially) operant conditioning
explain, along with behavior that is largely inherited, all of animal
and human behavior, at least at the behavioral level. Thus, if we wish
to simulate behavior (even if only the parts that this guy or that guy
define as "intelligent") we would do well to understand these phenomena
and to speculate upon, and look towards physiology to explain, how
these processes are "implemented" by physiology. [Part of "speculating
upon" is attempting simple models that can be tested by computer.]"

I do not dispute that Glenn's comments in this paragraph represent an
articulate expression of a reasonable point of view.  I don't agree
with everything he said, but if someone wants to speculate on say,
habituation by using simple models that can be tested by computer that
is fine with me.  I think that my view on this was implied when I said
in my first message, "If an intense study of 'reinforcement' is what
someone enjoys or otherwise wishes to pursue, that is his decision and
I would be interested in his results."  Or when I said to
feedbackdroid in a subsequent message, "I did not mean to sound as
negative as I did by the way.  I am not against reinforcement learning
theories, but I think as you seem to think that there are a lot of
other good theories and a lot of good variations that can be used
effectively in various situations."  Perhaps Glenn did not read my
second message where I explicitly used a broader generality in support
the exploration of alternative learning theories.  Or, perhaps, Glenn
disapproves of the study of learning theories that he doesn't support
and he is projecting his own kind of intolerance onto me.

Glenn's repetition of the comment, "How does this fit in with the
issue that I have outlined," combined with his "gibberish" remark
and his sarcastic "pixies" comments makes me think that this is the
same old Glenn who is not really interested in the exploration of
learning theories unless they fit in with his view of Behaviorism.  It
is difficult to argue on that basis, since every remark I would make
would have to be translated into the terms of Behaviorism.  Since I am
not an expert in Behaviorism, I would be at a disadvantage.  Finally, I
would have to spend a lot of time on this and with no expectation of
impartiality or even genuine curiosity from Glenn I don't feel that
this is worth the effort.  I don't dislike Glenn by the way, I just
don't have the time.

I feel that Behaviorism produced some insights of value, but it was
severely limited by its methodological constraints and it has been made
disreputable by the intolerance and arrogance of many of its
proponents.  The idea that Behaviorism explains everything about
psychology is just not proven and not provable.  I have no idea why
anyone would act as if it was.

If somneone wants to study models of Behaviorism that is his decision
and I don't care.  I actually think its fine.  However, computer
programmers should not be constrained by someone else's historical
dogma when it comes to finding technical solutions to programming
problems.

I would be happy to explore the possibilities of what I call "gradual
learning" if someone is genuinely interested.  But Glenn has not
shown evidence that he has understood my attempts to explain this
concept, in fact, he only dismisses it with his label of
"gibberish."  If you truly do not understand what I am talking
about then you would have to be willing to temporarily take on the role
of student in order for me to teach you about my ideas.  It is obvious
that is what would be required, but I don't honestly think Glenn is
willing to learn from me.  On the other hand, if Glenn is only
criticizing my remarks then I have to assume that his point is
something like: Your gibberish about gradual learning is not supported
by Behaviorist Theory. I already know that!  Again, this implied in my
first message.  No argument needed.  (Actually there may be some
possibilities that a few of my ideas could constitute an expansion of
Behaviorist theories, but I am not interested in pursuing that
possibility.)

Seriously, I would be willing to try to explore my ideas of gradual
learning and configuration learning if anyone was genuinely interested.

Jim Bromer

0
Reply Jim 7/15/2006 2:29:18 PM

Jim Bromer wrote:
>
> Seriously, I would be willing to try to explore my ideas of gradual
> learning and configuration learning if anyone was genuinely interested.
>

OK, let's get started. What is gradual learning, and under what
circumstances does it arise?

--
Joe Legris

0
Reply J 7/15/2006 3:06:30 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> Curt Welch wrote:

> > I'm not sure what you are asking.  But AI as I was talking about is a
> > reference to the job of trying to make a machine duplicate full human
> > behavior - not just the job of making machines perform interesting
> > tasks like playing chess.  ....

> Who cares, other than you and GS.

Well, a few billion people on the planet would care if someone figured out
how to do it. :)

But I take the care to write out "full human behavior" in the debates about
what intelligence is to make it clear my intended final target.

> I realize there's a tendency to mix
> everything into the same line of thinking, but let the psychologists
> worry about "full" human behavior. Commander Data is scifi, not AI. I'd
> just like my robot/AI to get itself across the street in one piece, plus
> a few other well-chosen tasks.

Creating engineering solutions to specific tasks is fun and useful.  I've
spent most by life doing it.  But my interest in AI is not simply to push
the state of the art forward by finding new solutions to limited domain
tasks.  Nor is my interest to just find new technologies to use in my
projects. I'm only really interested in technology that I can believe is on
a direct path to creating Commander Data.  As such, I try to identify
everything that is needed to create a Commander Data, and see which
technologies seem to be on that path, and which seem not to be on the path.
Everything that looks off the path to me, I mostly ignore.  Anything on the
path, I try to understand and advance. I try to understand what's missing,
and what needs to be filled in to get us there.  As you know from reading
too many of my posts, the prime missing link I see is stronger, real time,
high dimension, reinforcement learning systems.

I don't actually care all that much about full human behavior.  I care
about creating something like commander data - that is, a machine with the
skills needed to do any job I might want done, which I am now forced to
hire a human to do because we don't know how to build a machine to do it.
This is close, but not the same thing, as full human behavior.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/15/2006 8:14:58 PM

J.A. Legris wrote:
> OK, let's get started. What is gradual learning, and under what
> circumstances does it arise?
>
> --
> Joe Legris

The classical example of logical reasoning is,
All men are mortal.
Socrates is a man.
Therefore, we know -by form- that Socrates is mortal.

This concept of form was also used in the development of algebra where
we know facts like,
2a + 2a = 4a
if a is any real number.  So, for example, we know -by form- that if
a=3 then 2*3+2*3=4*3.

One of the GOFAI models used categories and logic in order to create
logical conclusions for new information based on previously stored
information.  In a few cases this model produced good results even for
some novel examples.  But, it also produced a lot of incorrect results
as well.  I wondered why this GOFAI model did not work better more
often.  One of the reasons I discovered is that we learn gradually, so
that by the time we are capable of realizing that the philosopher is
mortal just because he is a man and all men are mortal, we also know a
huge amount of other information that is relevant to this problem.  The
child learns about mortality in dozens of ways if not hundreds or even
thousands of ways before he is capable of realizing that since all men
are mortal, then Socrates must also be mortal.

I realized that this kind of logical reasoning can be likened to
instant learning.  If you learn that Ed is a man, then you also
instantly know that Ed must be mortal as well.  This is indeed a valid
process, and I feel that it is an important aspect of intelligence.
But before we get to the stage where we can derive an insight through
previously learned information and have some capability to judge the
value of that derived insight, we have to learn a great many related
pieces of knowledge.  So my argument here, is that while instant
derivations are an important part of Artificial Intelligence, we also
need to be able to use more gradual learning methods to produce the
prerequisite background information so that derived insights can be
used more effectively.

Gradual learning is an important part of this process.  We first learn
about things in piecemeal fashion before we can put more complicated
ideas together.  I would say that reinforcement learning is a form of
gradual learning but there are great many other methods of gradual
learning available to the computer programmer.

It's hard for most people to understand me (or for that matter even
to believe me) when I try to describe how adaptive AI learning might
take place without predefined variable-data references.  So it is much
easier for me to use some kind of data variable-explicit model to try
to talk about my ideas.

Imagine a complicated production process that had all kinds of sensors
and alarms.  You might imagine a refinery or something like that.
However, since I don't know too much about material processes, I
wouldn't try to simulate something like that but I would instead
create a computer model that used algorithms to produce streams of data
to represent the data produced by an array of sensors.  Under a number
of different situations, alarms would go off when certain combinations
of sensor threshold values were hit.  This computer generated model
would be put through thousands of different runs using different
initial input parameters so that it would produce a wide range of data
streams through the virtual sensors.  It would then be the job of the
AI module to try to predict which alarms would be triggered and when
they would be triggered before the event occurred.  The algorithms that
produced the alarms could be varied and complicated.  For example, if
sensor line 3 and sensor line 4 go beyond some threshold values for at
least 5 units of time, then alarm 23 would be triggered unless line 6
dipped below some threshold value at least two times in the 10 units of
time before.  There might be hundreds of such alarm scenarios.
Individual sensor lines might be involved in a number of different
alarm scenarios.  An alarm might, for another example, be triggered if
the average value of all the sensor inputs was within some specified
range.  The specified triggers for some alarms might change from run to
run, or even during a run.  Some of these scenarios would be simple,
and some might be very complex.  Some scenarios might even be triggered
by non-sensed events.  The range of possibilities, even within this
very constrained data-event model is tremendous if not truly infinite.

The AI module might be exposed to a number of runs that produced very
similar sensor values, or it might be exposed to very few runs that
produced similar data streams.

Superficially this might look a little like a reinforcement scenario
since the alarms could be seen as negative reinforcements, but it
clearly is not a proper model for behaviorist conditioning.  The only
innate 'behavior' is that the AI module is programmed to produce is to
try to develop conjectures to predict the data events that could
trigger the various alarms.

I argue that since simplistic assessments of the runs would not work
for every kind of alarm scenario, the program should start out with
gradual learning in order to reduce the false positives where it
predicted an alarm event that did not subsequently occur.

This model might have hundreds or thousands of sensors. It might have
hundreds of alarms.  It might have a variety of combinations of data
events that could cause or inhibit an alarm.  Non-sensible data events
might interact with the sensory data events to trigger or inhibit an
alarm.  Furthermore, the AI module might be able to mitigate or operate
the data events that drive the sensors so that it could run interactive
experiments to test its conjectures.

I have described a complex model where an imagined AI module would have
to make conjectures about the data events that triggered an alarm.  Off
hand I cannot think of any one learning method that would be best for
this problem.  So lacking that wisdom I would suggest that the program
might run hundreds or even thousands of different learning methods in
an effort to discover predictive conjectures that would have a high
correlation with actual alarms.  This is a complex model problem which
does not lend itself to a single simplistic AI paradigm.  I contend
that the use of hundreds or maybe even thousands of learning mechanisms
is going to be a necessary component of innovative AI paradigms in near
future.  And it seems reasonable to assume that initial learning is
typically going to be a gradual process in such complex scenarios.

I will try to finish this in the next few days so that I can describe
some of the different methods to produce conjectures that might be made
in this setting and to try to show how some of these methods could be
seen as making instant conjectures while others could be seen as
examples of gradual learning.

Jim Bromer

0
Reply Jim 7/16/2006 3:15:32 PM

Jim Bromer wrote:
 
> Imagine a complicated production process that had all kinds of sensors
> and alarms.

Imagine a joint probability distribution over a set of random variables.
Imagine estimating the distribution.

-- Michael

0
Reply Michael 7/16/2006 5:28:13 PM

Jim Bromer wrote:
> J.A. Legris wrote:
> > OK, let's get started. What is gradual learning, and under what
> > circumstances does it arise?
> >
> > --
> > Joe Legris
>
> The classical example of logical reasoning is,
> All men are mortal.
> Socrates is a man.
> Therefore, we know -by form- that Socrates is mortal.
>
> This concept of form was also used in the development of algebra where
> we know facts like,
> 2a + 2a = 4a
> if a is any real number.  So, for example, we know -by form- that if
> a=3 then 2*3+2*3=4*3.
>
> One of the GOFAI models used categories and logic in order to create
> logical conclusions for new information based on previously stored
> information.  In a few cases this model produced good results even for
> some novel examples.  But, it also produced a lot of incorrect results
> as well.  I wondered why this GOFAI model did not work better more
> often.  One of the reasons I discovered is that we learn gradually, so
> that by the time we are capable of realizing that the philosopher is
> mortal just because he is a man and all men are mortal, we also know a
> huge amount of other information that is relevant to this problem.  The
> child learns about mortality in dozens of ways if not hundreds or even
> thousands of ways before he is capable of realizing that since all men
> are mortal, then Socrates must also be mortal.
>
> I realized that this kind of logical reasoning can be likened to
> instant learning.  If you learn that Ed is a man, then you also
> instantly know that Ed must be mortal as well.  This is indeed a valid
> process, and I feel that it is an important aspect of intelligence.
> But before we get to the stage where we can derive an insight through
> previously learned information and have some capability to judge the
> value of that derived insight, we have to learn a great many related
> pieces of knowledge.  So my argument here, is that while instant
> derivations are an important part of Artificial Intelligence, we also
> need to be able to use more gradual learning methods to produce the
> prerequisite background information so that derived insights can be
> used more effectively.
>
> Gradual learning is an important part of this process.  We first learn
> about things in piecemeal fashion before we can put more complicated
> ideas together.  I would say that reinforcement learning is a form of
> gradual learning but there are great many other methods of gradual
> learning available to the computer programmer.
>
> It's hard for most people to understand me (or for that matter even
> to believe me) when I try to describe how adaptive AI learning might
> take place without predefined variable-data references.  So it is much
> easier for me to use some kind of data variable-explicit model to try
> to talk about my ideas.
>
> Imagine a complicated production process that had all kinds of sensors
> and alarms.  You might imagine a refinery or something like that.
> However, since I don't know too much about material processes, I
> wouldn't try to simulate something like that but I would instead
> create a computer model that used algorithms to produce streams of data
> to represent the data produced by an array of sensors.  Under a number
> of different situations, alarms would go off when certain combinations
> of sensor threshold values were hit.  This computer generated model
> would be put through thousands of different runs using different
> initial input parameters so that it would produce a wide range of data
> streams through the virtual sensors.  It would then be the job of the
> AI module to try to predict which alarms would be triggered and when
> they would be triggered before the event occurred.  The algorithms that
> produced the alarms could be varied and complicated.  For example, if
> sensor line 3 and sensor line 4 go beyond some threshold values for at
> least 5 units of time, then alarm 23 would be triggered unless line 6
> dipped below some threshold value at least two times in the 10 units of
> time before.  There might be hundreds of such alarm scenarios.
> Individual sensor lines might be involved in a number of different
> alarm scenarios.  An alarm might, for another example, be triggered if
> the average value of all the sensor inputs was within some specified
> range.  The specified triggers for some alarms might change from run to
> run, or even during a run.  Some of these scenarios would be simple,
> and some might be very complex.  Some scenarios might even be triggered
> by non-sensed events.  The range of possibilities, even within this
> very constrained data-event model is tremendous if not truly infinite.
>
> The AI module might be exposed to a number of runs that produced very
> similar sensor values, or it might be exposed to very few runs that
> produced similar data streams.
>
> Superficially this might look a little like a reinforcement scenario
> since the alarms could be seen as negative reinforcements, but it
> clearly is not a proper model for behaviorist conditioning.  The only
> innate 'behavior' is that the AI module is programmed to produce is to
> try to develop conjectures to predict the data events that could
> trigger the various alarms.
>
> I argue that since simplistic assessments of the runs would not work
> for every kind of alarm scenario, the program should start out with
> gradual learning in order to reduce the false positives where it
> predicted an alarm event that did not subsequently occur.

What you've described so far sounds like the Bayesian model that
Michael Olea has been describing, where an estimate of the posterior
probability of an event is updated afer each observation of the
evidence. Is this the sort of thing you have in mind? At some point,
perhaps depending on a threshold probability level, a decision would
have to be made about whether the corresponding alarm should be
triggered.

>
> This model might have hundreds or thousands of sensors. It might have
> hundreds of alarms.  It might have a variety of combinations of data
> events that could cause or inhibit an alarm.  Non-sensible data events
> might interact with the sensory data events to trigger or inhibit an
> alarm.  Furthermore, the AI module might be able to mitigate or operate
> the data events that drive the sensors so that it could run interactive
> experiments to test its conjectures.
>
> I have described a complex model where an imagined AI module would have
> to make conjectures about the data events that triggered an alarm.  Off
> hand I cannot think of any one learning method that would be best for
> this problem.  So lacking that wisdom I would suggest that the program
> might run hundreds or even thousands of different learning methods in
> an effort to discover predictive conjectures that would have a high
> correlation with actual alarms.  This is a complex model problem which
> does not lend itself to a single simplistic AI paradigm.  I contend
> that the use of hundreds or maybe even thousands of learning mechanisms
> is going to be a necessary component of innovative AI paradigms in near
> future.  And it seems reasonable to assume that initial learning is
> typically going to be a gradual process in such complex scenarios.
>
> I will try to finish this in the next few days so that I can describe
> some of the different methods to produce conjectures that might be made
> in this setting and to try to show how some of these methods could be
> seen as making instant conjectures while others could be seen as
> examples of gradual learning.
>
> Jim Bromer

It seems like a big jump from predicting outcomes, even thousands of
them, to running interactive experiments to test the predictions. How
might that work?

--
Joe Legris

0
Reply J 7/16/2006 5:28:51 PM

"J.A. Legris" <jalegris@sympatico.ca> wrote in message 
news:1153070931.890020.86590@m73g2000cwd.googlegroups.com...
>
> Jim Bromer wrote:
>> J.A. Legris wrote:
>> > OK, let's get started. What is gradual learning, and under what
>> > circumstances does it arise?
>> >
>> > --
>> > Joe Legris
>>
>> The classical example of logical reasoning is,
>> All men are mortal.
>> Socrates is a man.
>> Therefore, we know -by form- that Socrates is mortal.
>>
>> This concept of form was also used in the development of algebra where
>> we know facts like,
>> 2a + 2a = 4a
>> if a is any real number.  So, for example, we know -by form- that if
>> a=3 then 2*3+2*3=4*3.
>>
>> One of the GOFAI models used categories and logic in order to create
>> logical conclusions for new information based on previously stored
>> information.  In a few cases this model produced good results even for
>> some novel examples.  But, it also produced a lot of incorrect results
>> as well.  I wondered why this GOFAI model did not work better more
>> often.  One of the reasons I discovered is that we learn gradually, so
>> that by the time we are capable of realizing that the philosopher is
>> mortal just because he is a man and all men are mortal, we also know a
>> huge amount of other information that is relevant to this problem.  The
>> child learns about mortality in dozens of ways if not hundreds or even
>> thousands of ways before he is capable of realizing that since all men
>> are mortal, then Socrates must also be mortal.
>>
>> I realized that this kind of logical reasoning can be likened to
>> instant learning.  If you learn that Ed is a man, then you also
>> instantly know that Ed must be mortal as well.  This is indeed a valid
>> process, and I feel that it is an important aspect of intelligence.
>> But before we get to the stage where we can derive an insight through
>> previously learned information and have some capability to judge the
>> value of that derived insight, we have to learn a great many related
>> pieces of knowledge.  So my argument here, is that while instant
>> derivations are an important part of Artificial Intelligence, we also
>> need to be able to use more gradual learning methods to produce the
>> prerequisite background information so that derived insights can be
>> used more effectively.
>>
>> Gradual learning is an important part of this process.  We first learn
>> about things in piecemeal fashion before we can put more complicated
>> ideas together.  I would say that reinforcement learning is a form of
>> gradual learning but there are great many other methods of gradual
>> learning available to the computer programmer.
>>
>> It's hard for most people to understand me (or for that matter even
>> to believe me) when I try to describe how adaptive AI learning might
>> take place without predefined variable-data references.  So it is much
>> easier for me to use some kind of data variable-explicit model to try
>> to talk about my ideas.
>>
>> Imagine a complicated production process that had all kinds of sensors
>> and alarms.  You might imagine a refinery or something like that.
>> However, since I don't know too much about material processes, I
>> wouldn't try to simulate something like that but I would instead
>> create a computer model that used algorithms to produce streams of data
>> to represent the data produced by an array of sensors.  Under a number
>> of different situations, alarms would go off when certain combinations
>> of sensor threshold values were hit.  This computer generated model
>> would be put through thousands of different runs using different
>> initial input parameters so that it would produce a wide range of data
>> streams through the virtual sensors.  It would then be the job of the
>> AI module to try to predict which alarms would be triggered and when
>> they would be triggered before the event occurred.  The algorithms that
>> produced the alarms could be varied and complicated.  For example, if
>> sensor line 3 and sensor line 4 go beyond some threshold values for at
>> least 5 units of time, then alarm 23 would be triggered unless line 6
>> dipped below some threshold value at least two times in the 10 units of
>> time before.  There might be hundreds of such alarm scenarios.
>> Individual sensor lines might be involved in a number of different
>> alarm scenarios.  An alarm might, for another example, be triggered if
>> the average value of all the sensor inputs was within some specified
>> range.  The specified triggers for some alarms might change from run to
>> run, or even during a run.  Some of these scenarios would be simple,
>> and some might be very complex.  Some scenarios might even be triggered
>> by non-sensed events.  The range of possibilities, even within this
>> very constrained data-event model is tremendous if not truly infinite.
>>
>> The AI module might be exposed to a number of runs that produced very
>> similar sensor values, or it might be exposed to very few runs that
>> produced similar data streams.
>>
>> Superficially this might look a little like a reinforcement scenario
>> since the alarms could be seen as negative reinforcements, but it
>> clearly is not a proper model for behaviorist conditioning.  The only
>> innate 'behavior' is that the AI module is programmed to produce is to
>> try to develop conjectures to predict the data events that could
>> trigger the various alarms.
>>
>> I argue that since simplistic assessments of the runs would not work
>> for every kind of alarm scenario, the program should start out with
>> gradual learning in order to reduce the false positives where it
>> predicted an alarm event that did not subsequently occur.
>
> What you've described so far sounds like the Bayesian model that
> Michael Olea has been describing, where an estimate of the posterior
> probability of an event is updated afer each observation of the
> evidence.

This strikes me as enormously charitable. It seems to me that he has said 
little else than:



1.)    We may be able to predict some events if we have access to some part 
of what has happened.

2.)    We should build a machine that does that.


>Is this the sort of thing you have in mind? At some point,
> perhaps depending on a threshold probability level, a decision would
> have to be made about whether the corresponding alarm should be
> triggered.
>
>>
>> This model might have hundreds or thousands of sensors. It might have
>> hundreds of alarms.  It might have a variety of combinations of data
>> events that could cause or inhibit an alarm.  Non-sensible data events
>> might interact with the sensory data events to trigger or inhibit an
>> alarm.  Furthermore, the AI module might be able to mitigate or operate
>> the data events that drive the sensors so that it could run interactive
>> experiments to test its conjectures.
>>
>> I have described a complex model where an imagined AI module would have
>> to make conjectures about the data events that triggered an alarm.  Off
>> hand I cannot think of any one learning method that would be best for
>> this problem.  So lacking that wisdom I would suggest that the program
>> might run hundreds or even thousands of different learning methods in
>> an effort to discover predictive conjectures that would have a high
>> correlation with actual alarms.  This is a complex model problem which
>> does not lend itself to a single simplistic AI paradigm.  I contend
>> that the use of hundreds or maybe even thousands of learning mechanisms
>> is going to be a necessary component of innovative AI paradigms in near
>> future.  And it seems reasonable to assume that initial learning is
>> typically going to be a gradual process in such complex scenarios.
>>
>> I will try to finish this in the next few days so that I can describe
>> some of the different methods to produce conjectures that might be made
>> in this setting and to try to show how some of these methods could be
>> seen as making instant conjectures while others could be seen as
>> examples of gradual learning.
>>
>> Jim Bromer
>
> It seems like a big jump from predicting outcomes, even thousands of
> them, to running interactive experiments to test the predictions. How
> might that work?
>
> --
> Joe Legris
> 


0
Reply Glen 7/16/2006 6:52:29 PM

J.A. Legris wrote:
 
> What you've described so far sounds like the Bayesian model that
> Michael Olea has been describing, where an estimate of the posterior
> probability of an event is updated afer each observation of the
> evidence. Is this the sort of thing you have in mind? At some point,
> perhaps depending on a threshold probability level, a decision would
> have to be made about whether the corresponding alarm should be
> triggered.

That would be where the "utility model" comes in (moving from Bayesian
Inference into Bayesian Decision Theory) - the cost and gain functions over
consequences. So you pick the thresold to maximize expected utility. That
is, of course, a normative theory, not a descriptive one - what an agent
should do, not what particular agents do in fact do. Even so it is often a
good model of behavior under experimental conditions. There is a consistent
difference, I've mentioned a few times, between the normative model and a
descriptive model of "matching law" like behavior. Suppose you have two
choices A and B, and that the expected utility is 90 for A and 10 for B.
The optimal choice is pick A every time. The observed behavior is more like
pick A 90% of the time, pick B 10% of the time. The discrepancy arises only
if the probability distribution is known, and stationary. If the
distribution is unknown (i.e. being estimated, or "learned"), and if it
might be changing then the matching law makes more sense, has been shown to
be optimal under some idealized conditions, and is a form of "importance
sampling", very much like particle filtering methods of approximate
Bayesian inference.

> It seems like a big jump from predicting outcomes, even thousands of
> them, to running interactive experiments to test the predictions. How
> might that work?

That, "intervention", gets a lot of attention in Judea Pearl's second major
book, the one on "Causality". It also has been studied in terms of "value
of information". Bayesian medical expert systems do a limited form of this
by suggesting tests to perform in order to arrive at a diagnosis. The role
of intervention in learning has also been studied in, for example,
developmental psychology. Discounting evidence ("let me try it, you just
aren't doing it right") is one example. It is a major theme in Allison
Gopnik's work:

http://ihd.berkeley.edu/gopnik.htm

For example:

A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks (2004). A
theory of causal learning in children: Causal maps and Bayes nets.
Psychological Review, 111, 1, 1-31. 

T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring hidden
causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
Science Society 

-- Michael

 


0
Reply Michael 7/16/2006 6:53:24 PM

"Jim Bromer" <jbromer@isp.com> wrote:
> J.A. Legris wrote:
> > OK, let's get started. What is gradual learning, and under what
> > circumstances does it arise?
> >
> > --
> > Joe Legris
>
> The classical example of logical reasoning is,
> All men are mortal.
> Socrates is a man.
> Therefore, we know -by form- that Socrates is mortal.
>
> This concept of form was also used in the development of algebra where
> we know facts like,
> 2a + 2a = 4a
> if a is any real number.  So, for example, we know -by form- that if
> a=3 then 2*3+2*3=4*3.
>
> One of the GOFAI models used categories and logic in order to create
> logical conclusions for new information based on previously stored
> information.  In a few cases this model produced good results even for
> some novel examples.  But, it also produced a lot of incorrect results
> as well.  I wondered why this GOFAI model did not work better more
> often.  One of the reasons I discovered is that we learn gradually, so
> that by the time we are capable of realizing that the philosopher is
> mortal just because he is a man and all men are mortal, we also know a
> huge amount of other information that is relevant to this problem.  The
> child learns about mortality in dozens of ways if not hundreds or even
> thousands of ways before he is capable of realizing that since all men
> are mortal, then Socrates must also be mortal.

You seem to be describing the problem that the GOFAI people talk about as a
lack of common sense.  They feel their approaches don't have it and needs
it.  Meaning, that for any base of high level knowledge we seem to have,
it's always supported by a larger base of lower level knowledge.  All the
high level knowledge they try to put into a machine ends up lacking support
from the common sense facts about this knowledge that humans always have.

For example, if you can teach a logic machine that airplanes can fly and
bees can fly but that alone doesn't let the machine know the simple fact
that airplanes are normally very large machines (much lager than us), and
bees are very small animals (much smaller than us).  This is a common sense
fact easily picked up by seeing a bee or an airplane for yourself, but one
of the many things you might have forgotten to tell a logic machine when
you were trying to hand-program knowledge into it.

This problem however recurses.  No matter how much common sense knowledge
you put into the machine, that new knowledge is also lacking the common
sense support from below.  And the missing support from below, always seems
to be larger than  what you have already put into the machine.  The harder
you work to solve the problem, the bigger the problem seems to get.

My take on this problem, is that you can't hand program human level
knowledge into a machine.  Humans are not capable of doing it.  We don't
understand the knowledge in our own heads well enough to simply copy the
knowledge out of our heads, into a computer, and reach full human
intelligence.  We can only do it for limited domain problems like chess, or
all the other millions of programs we have written by simply translating
knowledge from our head, into computer code.

The missing piece of the puzzle is learning.  The first approach that
Turing suggested - to build an adult machine by hand-coding human knowledge
into a machine can never work to reach full human intelligence.  The high
level knowledge we understand exists in our brain is not enough to make a
machine intelligent.

Turing's second approach, building a baby machine and letting it learn for
itself is the only approach that can work.  These machines build their own
base of knowledge from the bottom up, instead of us trying to fill it in
from the top down.

And what you say about your gradual learning seems to fit with this few of
mine in the fact that we need a learning system that slowly builds from the
bottom up, all the knowledge it needs to support a high level concept like
"man is mortal".

I don't think however you have added anything by putting the word "gradual"
in front of it.  There is no other type of learning.  All learning systems
(of any real interest to AI and psychology) our gradual.  They add new
knowledge on top of old knowledge.  This causes a progressive build up of
knowledge over time - which makes it gradual.  A computer memory cell,
which erases all traces of the old knowledge when it learns something new,
is an example of instant learning.  And that type of learning is well
understood and so uninteresting we don't both to call it learning, we call
it memory.  Everything we use the word "learning" to described, is gradual
learning.  So I don't see why you both to put the word "gradual" in front
of it.  It's a redundancy to do in my view.

> I realized that this kind of logical reasoning can be likened to
> instant learning.  If you learn that Ed is a man, then you also
> instantly know that Ed must be mortal as well.  This is indeed a valid
> process, and I feel that it is an important aspect of intelligence.
> But before we get to the stage where we can derive an insight through
> previously learned information and have some capability to judge the
> value of that derived insight, we have to learn a great many related
> pieces of knowledge.  So my argument here, is that while instant
> derivations are an important part of Artificial Intelligence, we also
> need to be able to use more gradual learning methods to produce the
> prerequisite background information so that derived insights can be
> used more effectively.
>
> Gradual learning is an important part of this process.  We first learn
> about things in piecemeal fashion before we can put more complicated
> ideas together.  I would say that reinforcement learning is a form of
> gradual learning but there are great many other methods of gradual
> learning available to the computer programmer.
>
> It's hard for most people to understand me (or for that matter even
> to believe me) when I try to describe how adaptive AI learning might
> take place without predefined variable-data references.

Not for me.

> So it is much
> easier for me to use some kind of data variable-explicit model to try
> to talk about my ideas.
>
> Imagine a complicated production process that had all kinds of sensors
> and alarms.  You might imagine a refinery or something like that.
> However, since I don't know too much about material processes, I
> wouldn't try to simulate something like that but I would instead
> create a computer model that used algorithms to produce streams of data
> to represent the data produced by an array of sensors.  Under a number
> of different situations, alarms would go off when certain combinations
> of sensor threshold values were hit.  This computer generated model
> would be put through thousands of different runs using different
> initial input parameters so that it would produce a wide range of data
> streams through the virtual sensors.  It would then be the job of the
> AI module to try to predict which alarms would be triggered and when
> they would be triggered before the event occurred.  The algorithms that
> produced the alarms could be varied and complicated.  For example, if
> sensor line 3 and sensor line 4 go beyond some threshold values for at
> least 5 units of time, then alarm 23 would be triggered unless line 6
> dipped below some threshold value at least two times in the 10 units of
> time before.  There might be hundreds of such alarm scenarios.
> Individual sensor lines might be involved in a number of different
> alarm scenarios.  An alarm might, for another example, be triggered if
> the average value of all the sensor inputs was within some specified
> range.  The specified triggers for some alarms might change from run to
> run, or even during a run.  Some of these scenarios would be simple,
> and some might be very complex.  Some scenarios might even be triggered
> by non-sensed events.  The range of possibilities, even within this
> very constrained data-event model is tremendous if not truly infinite.
>
> The AI module might be exposed to a number of runs that produced very
> similar sensor values, or it might be exposed to very few runs that
> produced similar data streams.
>
> Superficially this might look a little like a reinforcement scenario
> since the alarms could be seen as negative reinforcements,

Yeah, but as you have described, the purpose is to predict the alarms, and
nothing else.  This is not reinforcement learning - which would have the
purpose of preventing the alarms (if the alarms were a negative reward).

However, if your thinking (which you have not written) is that the machine
would use it's understanding to try and prevent the alarms, then you have
simply described the reinforcement learning problem.

> but it
> clearly is not a proper model for behaviorist conditioning.  The only
> innate 'behavior' is that the AI module is programmed to produce is to
> try to develop conjectures to predict the data events that could
> trigger the various alarms.

I believe most people would look at that as a type of unsupervised
learning.

> I argue that since simplistic assessments of the runs would not work
> for every kind of alarm scenario, the program should start out with
> gradual learning in order to reduce the false positives where it
> predicted an alarm event that did not subsequently occur.
>
> This model might have hundreds or thousands of sensors. It might have
> hundreds of alarms.

What exactly is your point in defining some inputs as sensor inputs and
some as alarm inputs?  Why the distinction?  Why not just call them all
sensors and describe the point of the machine as being one of predicting
all sensor inputs before they happen?  Why only predict the inputs you
label as "alarm" inputs?

> It might have a variety of combinations of data
> events that could cause or inhibit an alarm.  Non-sensible data events
> might interact with the sensory data events to trigger or inhibit an
> alarm.  Furthermore, the AI module might be able to mitigate or operate
> the data events that drive the sensors so that it could run interactive
> experiments to test its conjectures.
>
> I have described a complex model where an imagined AI module would have
> to make conjectures about the data events that triggered an alarm.  Off
> hand I cannot think of any one learning method that would be best for
> this problem.  So lacking that wisdom I would suggest that the program
> might run hundreds or even thousands of different learning methods in
> an effort to discover predictive conjectures that would have a high
> correlation with actual alarms.  This is a complex model problem which
> does not lend itself to a single simplistic AI paradigm.

Except for the fact that by combining multiple techniques into one, and
deciding which to try, and how long to try each, all you have done is
defined yet another single learning system.  When you are done, whatever
you have built, will be a just another single learning system.

This is the point of the new free lunch theorem in learning systems:

http://en.wikipedia.org/wiki/No-free-lunch_theorem

Every learning system has an inherent bias and there is no way to get
around that by trying to use all possible theorems.  Any attempt to do so
will just change the bias to something else.

The "best learning" is the one with the bias that will produce the best
answers over the class of problems it will be expected to have to deal
with.  All learning systems in the end must be biased towards a specific
class of problems.

To solve AI, you have to both correctly understand the class of problems it
will be expected to solve, and then, find the algorithm that best fits that
class.  You can't cheat by just trying all of them.  It's highly unlikely
that an approach based on, "I don't, try them all" is going to work very
well.

> I contend
> that the use of hundreds or maybe even thousands of learning mechanisms
> is going to be a necessary component of innovative AI paradigms in near
> future.  And it seems reasonable to assume that initial learning is
> typically going to be a gradual process in such complex scenarios.

Initial, and final learning, is gradual in in all cases.  I suspect however
that in humans, it's actually the inverse from what you suggest.  We
probably learn more in our first 5 years than we do in the rest of lives.
The only reason it doesn't feel this way to us is that we take all that
initial learning for granted.  The foundation of knowledge it gives us we
take for granted - like the fact that to touch something to our left, we
have to reach to the left.  Knowing this, and knowing how to do this, is
not trivial in any sense when you look at the complexity of the problem
from the view of a robot trying to learn to use a manipulator to gap a coke
can.  But it's one of the billions of things we as humans never think twice
about.  It's one of the billions of things that forms our huge foundation
of common sense knowledge that our high level of knowledge is built from.

The issue I have with what you have written so far, is that it's focused on
the idea of a machine extracting knowledge from the environment, but it
ignores the most important question of AI - what do we do with the
knowledge?

How does a machine which learns to partially predict the input signals
labeled as being "alarms", determine how to move its arm?  Why would it
move its arm?

How does a machine that's good at predicting alarms, get us closer to
building Commander Data from STTNG?

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/16/2006 7:13:37 PM

Glen M. Sizemore wrote:
> "J.A. Legris" <jalegris@sympatico.ca> wrote in message
> news:1153070931.890020.86590@m73g2000cwd.googlegroups.com...
> >
> > Jim Bromer wrote:
> >>
> >> Imagine a complicated production process that had all kinds of sensors
> >> and alarms.  You might imagine a refinery or something like that.
> >> However, since I don't know too much about material processes, I
> >> wouldn't try to simulate something like that but I would instead
> >> create a computer model that used algorithms to produce streams of data
> >> to represent the data produced by an array of sensors.  Under a number
> >> of different situations, alarms would go off when certain combinations
> >> of sensor threshold values were hit.  This computer generated model
> >> would be put through thousands of different runs using different
> >> initial input parameters so that it would produce a wide range of data
> >> streams through the virtual sensors.  It would then be the job of the
> >> AI module to try to predict which alarms would be triggered and when
> >> they would be triggered before the event occurred.  The algorithms that
> >> produced the alarms could be varied and complicated.  For example, if
> >> sensor line 3 and sensor line 4 go beyond some threshold values for at
> >> least 5 units of time, then alarm 23 would be triggered unless line 6
> >> dipped below some threshold value at least two times in the 10 units of
> >> time before.  There might be hundreds of such alarm scenarios.
> >> Individual sensor lines might be involved in a number of different
> >> alarm scenarios.  An alarm might, for another example, be triggered if
> >> the average value of all the sensor inputs was within some specified
> >> range.  The specified triggers for some alarms might change from run to
> >> run, or even during a run.  Some of these scenarios would be simple,
> >> and some might be very complex.  Some scenarios might even be triggered
> >> by non-sensed events.  The range of possibilities, even within this
> >> very constrained data-event model is tremendous if not truly infinite.
> >>
> >> The AI module might be exposed to a number of runs that produced very
> >> similar sensor values, or it might be exposed to very few runs that
> >> produced similar data streams.
> >>
> >> Superficially this might look a little like a reinforcement scenario
> >> since the alarms could be seen as negative reinforcements, but it
> >> clearly is not a proper model for behaviorist conditioning.  The only
> >> innate 'behavior' is that the AI module is programmed to produce is to
> >> try to develop conjectures to predict the data events that could
> >> trigger the various alarms.
> >>
> >> I argue that since simplistic assessments of the runs would not work
> >> for every kind of alarm scenario, the program should start out with
> >> gradual learning in order to reduce the false positives where it
> >> predicted an alarm event that did not subsequently occur.
> >
> > What you've described so far sounds like the Bayesian model that
> > Michael Olea has been describing, where an estimate of the posterior
> > probability of an event is updated afer each observation of the
> > evidence.
>
> This strikes me as enormously charitable. It seems to me that he has said
> little else than:
>
>
>
> 1.)    We may be able to predict some events if we have access to some part
> of what has happened.
>
> 2.)    We should build a machine that does that.
>
>

Maybe I should have said "sounds consistent with" instead of "sounds
like", but what grabbed me was the idea that his AI should get
incrementally better at making predictions with repeated exposures to
informative data. Bayesian probability suggests a "machine" for
carrying this out. 
--
Joe Legris

0
Reply J 7/16/2006 7:29:10 PM

Michael Olea <oleaj@sbcglobal.net> wrote:
> J.A. Legris wrote:
>
> > What you've described so far sounds like the Bayesian model that
> > Michael Olea has been describing, where an estimate of the posterior
> > probability of an event is updated afer each observation of the
> > evidence. Is this the sort of thing you have in mind? At some point,
> > perhaps depending on a threshold probability level, a decision would
> > have to be made about whether the corresponding alarm should be
> > triggered.
>
> That would be where the "utility model" comes in (moving from Bayesian
> Inference into Bayesian Decision Theory) - the cost and gain functions
> over consequences. So you pick the thresold to maximize expected utility.
> That is, of course, a normative theory, not a descriptive one - what an
> agent should do, not what particular agents do in fact do. Even so it is
> often a good model of behavior under experimental conditions. There is a
> consistent difference, I've mentioned a few times, between the normative
> model and a descriptive model of "matching law" like behavior. Suppose
> you have two choices A and B, and that the expected utility is 90 for A
> and 10 for B. The optimal choice is pick A every time. The observed
> behavior is more like pick A 90% of the time, pick B 10% of the time. The
> discrepancy arises only if the probability distribution is known, and
> stationary. If the distribution is unknown (i.e. being estimated, or
> "learned"), and if it might be changing then the matching law makes more
> sense, has been shown to be optimal under some idealized conditions, and
> is a form of "importance sampling", very much like particle filtering
> methods of approximate Bayesian inference.
>
> > It seems like a big jump from predicting outcomes, even thousands of
> > them, to running interactive experiments to test the predictions. How
> > might that work?

That is the what the entire subject of reinforcement learning seeks to
answer.

It's a problem of trying to produce the correct behavior, to maximize
rewards, while at the same time, using all your behaviors, as experiments,
to collect data about what future behaviors you should be producing.

As Micheal points out above, if you know the utility (value) of two
behaviors are 90 and 10, the best answer would be to pick 90.  But if you
knowledge is not absolute, but only based on a limited number of past
"experiments", then picking 90 and never picking 10, will give you no
additional data about the relative value of the two behavior.  If you never
pick the 10 behavior again, you will never be able to update your knowledge
about their relative value.  If your knowledge is never absolute (which it
won't be for real world problems that deal with the universe - or for where
the learning happens though experiencing and all you will ever know is the
result of a fixed number of experiments) then the optimal solution is never
to stop experimenting - to never stop picking the 10 option at lest some
times to see if the result this time might be different.

The point of a reinforcement learning machine is to learn from it's own
interactions with the environment. The only thing such a machine can do to
change its faith, is to interact with the environment, so the only purpose
it can have, is to produce optimal interactions.  But since everything it
knows about how to interact, is learned from past interactions, the optimal
rules it uses for picking behaviors, has to take into account, that it's
got a dual purpose in life - 1) pick the behaviors that lead to the best
results, and 2) improve it's understanding about which behaviors are best
so it can make better choices in the future.  These two needs creates a
natural conflict which requires a compromise to be reached.  It must bias
its behavior selection towards the behaviors which are currently known to
be better, but never totally abandon the "bad" behaviors, because what was
once seen as bad, might later (in a different environment) might turn out
to be good.

The problem of AI is exactly the problem of how a machine both learns to
produce optimal behaviors, while at the same time, use all the results of
past behaviors, as experimental data to guide the selection of future
behaviors.  The solution to how this is done, is seen in humans and animals
as operant conditioning.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/16/2006 7:58:46 PM

"J.A. Legris" <jalegris@sympatico.ca> wrote:
> Glen M. Sizemore wrote:

> Maybe I should have said "sounds consistent with" instead of "sounds
> like", but what grabbed me was the idea that his AI should get
> incrementally better at making predictions with repeated exposures to
> informative data. Bayesian probability suggests a "machine" for
> carrying this out.

Just as all reinforcement learning algorithms are machines for carrying
that out as well.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/16/2006 8:02:05 PM

"J.A. Legris" <jalegris@sympatico.ca> wrote in message 
news:1153078150.511291.236680@75g2000cwc.googlegroups.com...
> Glen M. Sizemore wrote:
>> "J.A. Legris" <jalegris@sympatico.ca> wrote in message
>> news:1153070931.890020.86590@m73g2000cwd.googlegroups.com...
>> >
>> > Jim Bromer wrote:
>> >>
>> >> Imagine a complicated production process that had all kinds of sensors
>> >> and alarms.  You might imagine a refinery or something like that.
>> >> However, since I don't know too much about material processes, I
>> >> wouldn't try to simulate something like that but I would instead
>> >> create a computer model that used algorithms to produce streams of 
>> >> data
>> >> to represent the data produced by an array of sensors.  Under a number
>> >> of different situations, alarms would go off when certain combinations
>> >> of sensor threshold values were hit.  This computer generated model
>> >> would be put through thousands of different runs using different
>> >> initial input parameters so that it would produce a wide range of data
>> >> streams through the virtual sensors.  It would then be the job of the
>> >> AI module to try to predict which alarms would be triggered and when
>> >> they would be triggered before the event occurred.  The algorithms 
>> >> that
>> >> produced the alarms could be varied and complicated.  For example, if
>> >> sensor line 3 and sensor line 4 go beyond some threshold values for at
>> >> least 5 units of time, then alarm 23 would be triggered unless line 6
>> >> dipped below some threshold value at least two times in the 10 units 
>> >> of
>> >> time before.  There might be hundreds of such alarm scenarios.
>> >> Individual sensor lines might be involved in a number of different
>> >> alarm scenarios.  An alarm might, for another example, be triggered if
>> >> the average value of all the sensor inputs was within some specified
>> >> range.  The specified triggers for some alarms might change from run 
>> >> to
>> >> run, or even during a run.  Some of these scenarios would be simple,
>> >> and some might be very complex.  Some scenarios might even be 
>> >> triggered
>> >> by non-sensed events.  The range of possibilities, even within this
>> >> very constrained data-event model is tremendous if not truly infinite.
>> >>
>> >> The AI module might be exposed to a number of runs that produced very
>> >> similar sensor values, or it might be exposed to very few runs that
>> >> produced similar data streams.
>> >>
>> >> Superficially this might look a little like a reinforcement scenario
>> >> since the alarms could be seen as negative reinforcements, but it
>> >> clearly is not a proper model for behaviorist conditioning.  The only
>> >> innate 'behavior' is that the AI module is programmed to produce is to
>> >> try to develop conjectures to predict the data events that could
>> >> trigger the various alarms.
>> >>
>> >> I argue that since simplistic assessments of the runs would not work
>> >> for every kind of alarm scenario, the program should start out with
>> >> gradual learning in order to reduce the false positives where it
>> >> predicted an alarm event that did not subsequently occur.
>> >
>> > What you've described so far sounds like the Bayesian model that
>> > Michael Olea has been describing, where an estimate of the posterior
>> > probability of an event is updated afer each observation of the
>> > evidence.
>>
>> This strikes me as enormously charitable. It seems to me that he has said
>> little else than:
>>
>>
>>
>> 1.)    We may be able to predict some events if we have access to some 
>> part
>> of what has happened.
>>
>> 2.)    We should build a machine that does that.
>>
>>
>
> Maybe I should have said "sounds consistent with" instead of "sounds
> like", but what grabbed me was the idea that his AI should get
> incrementally better at making predictions with repeated exposures to
> informative data. Bayesian probability suggests a "machine" for
> carrying this out.


Yeah, but without specifying the Bayesian calculations, and what observable 
sorts of events enter into the calculations, he has said nothing more than: 
"We need a machine that can learn."  And the sort of specific conditions 
under which "learning" occurs is left totally unspecified. In other words, 
his position is utterly vacuous.





> --
> Joe Legris
> 


0
Reply Glen 7/16/2006 9:32:42 PM

Curt Welch wrote:
> "J.A. Legris" <jalegris@sympatico.ca> wrote:
> > Glen M. Sizemore wrote:
>
> > Maybe I should have said "sounds consistent with" instead of "sounds
> > like", but what grabbed me was the idea that his AI should get
> > incrementally better at making predictions with repeated exposures to
> > informative data. Bayesian probability suggests a "machine" for
> > carrying this out.
>
> Just as all reinforcement learning algorithms are machines for carrying
> that out as well.
>

Part of Bromer's proposal is that reinforcement-based learning is not
always necessary. I think he intends to show that his AI generates
specific predictions based on the conjectures it forms, and then
compares those predictions with actual outcomes. Conjectures that are
borne out are retained preferentially over those that fail.

Now this raises an interesting possibility: a learning system that just
sits there, calmly observing events, building up a supply of successful
theories about how the world works. Then suddenly it rises up and
exhibits fully developed overt behaviour, acquired gradually, but
rehearsed and perfected entirely internally. The behaviourist can
insist that reinforcement-based learning occured, but there's no
evidence of it. 

Just a guess.

--
Joe Legris

0
Reply J 7/16/2006 9:56:24 PM

J.A. Legris wrote:
> Curt Welch wrote:
> > "J.A. Legris" <jalegris@sympatico.ca> wrote:
> > > Glen M. Sizemore wrote:
> >
> > > Maybe I should have said "sounds consistent with" instead of "sounds
> > > like", but what grabbed me was the idea that his AI should get
> > > incrementally better at making predictions with repeated exposures to
> > > informative data. Bayesian probability suggests a "machine" for
> > > carrying this out.
> >
> > Just as all reinforcement learning algorithms are machines for carrying
> > that out as well.
> >
>
> Part of Bromer's proposal is that reinforcement-based learning is not
> always necessary. I think he intends to show that his AI generates
> specific predictions based on the conjectures it forms, and then
> compares those predictions with actual outcomes. Conjectures that are
> borne out are retained preferentially over those that fail.
>
> Now this raises an interesting possibility: a learning system that just
> sits there, calmly observing events, building up a supply of successful
> theories about how the world works. Then suddenly it rises up and
> exhibits fully developed overt behaviour, acquired gradually, but
> rehearsed and perfected entirely internally. The behaviourist can
> insist that reinforcement-based learning occured, but there's no
> evidence of it.
>
> Just a guess.
>


It may be possible to devise such an "artificial" system, but real
learning
in brains probably isn't just "either - or". Rather, every behavioral
situation probably involves several mechanisms, including some
prediction, some intuition/analogical thinking [possibly based upon
past experiences], and some direct reinforced learning.

As Bernt Heinrich describes, even crows are able to solve problems
they've never seen before, and without any practice. Eg, pulling
up a piece of food tied to a string. They somehow reason it out
"internally" before beginning the task behaviorally [externally],
and which can take up to 10 or more individual steps in sequence
to perform. See "the Mind of the Raven".

Also, Horace Barlow and Wm Calvin talk about the idea of "guessing
well". This has to do with attempting totally new tasks one has never
tried before, and internally working out an execution sequence based
upon analogy with previously learned acts. Can something be
reinforced that you've never seen or done before?

Also, there are some neural nets that do "1-pass" learning. Is this
reinforcement?

Also, Edelman would probably saw something like behaviors are
selected for via internal mechanisms. IOW, any given stimulus might
elicit any #of potential behavioral responses, but only one of these
ends up being selected for execution. Certainly this happens when
you search for the proper word to stick into a sentence. Internally,
many words are filtered past before one is finally spoken. And then
of course you have to option to stop saying the word even while it's
being spoken, if it's not the right selection. Plus, there are multiple

options for how the word is spoken, emphasis, inflection, etc.

Reinforcement learning is only part of the system.

0
Reply feedbackdroid 7/16/2006 11:58:45 PM

Glen M. Sizemore wrote:
> JB: I think that the attempt to prove an a priori assumption about the
> efficacy of 'reinforcement' in AI is not wise unless someone has
> made the decision that he wants to spend his time researching
> 'reinforcement' in computational learning theories.
>
>
>
> GS: The argument, which you have not dealt with, is that the behavioral
> phenomena that are referred to broadly as habituation (and sensitization,
> for that matter), classical conditioning, and (especially) operant
> conditioning explain, along with behavior that is largely inherited, all of
> animal and human behavior, at least at the behavioral level. Thus, if we
> wish to simulate behavior (even if only the parts that this guy or that guy
> define as "intelligent") we would do well to understand these phenomena and
> to speculate upon, and look towards physiology to explain, how these
> processes are "implemented" by physiology.  [Part of "speculating upon" is
> attempting simple models that can be tested by computer.]
>


DOH! Inherited behvaior? DOH! Look towards physiology to understand
them? DOH! How are these processes implemented by physiology?
DOH! Attempting simple computer models?

Welcome to the middle of the 20th century. People have been doing just
this for the past 50+ years. Where have you been?


>
>   In any event, it
> seems to be there are only two other possibilities. The first is that the
> assertion is wrong on either logical/conceptual grounds or on empirical
> grounds, and that we must, therefore, add more principles. The second is the
> bird/plane argument. That is, that "artificial intelligence" can be achieved
> in ways that don't have anything to do with "how nature does it."
>


Well, the 2nd is probably true for AI, and the first is undoubtedly
true for
real intelligence/brains. More principles.

Real brains, as well as real living cells for that matter, are
humungusly
complex, and one size fits all approaches, like operant conditioning,
just
don't cover it. Forgetting about the complex interactions between the
100B neurons and 100T synapses in the brain for a minute, just the
internal complexity of each of the "individual" brain cells alone is
phenomenal. At least HALF of the DNA of the genome factors into
brain operation. 3 billion base pairs in humans, and 1000s of internal
DNA -> RNA -> allosteric protein -> DNA regulatory feedback loops
in each cell alone.

Damn right there are more principles that we just don't understand.


>
> JB: I feel that people who insist on using reinforcement even when it does
> not work for them are just creating a problem where one does not need
> to exist.
>
>
>
> GS: How do you tell the difference between this view and the view that the
> processes in question are simply extremely complex?
>

Wow. Good guess.

............
>
> JB: None of the paradigms of the past have been shown to be capable of
> fully solving this problem.  Something new has to be explored.
>
>
>
> GS: Like what? Oh yeah, "knowledge," "reference," and what "gradual
>  learning"? What about little pixies, and the life force?
>


AWWW. Now you're not doing so well in the "GUESSING WELL"
department that I mentioned in the other post today on this thread.
You need to work on that. Analogical thinking, viz-aviz your past
experiences, should allow you to come up with a better guess than
pixies.

0
Reply feedbackdroid 7/17/2006 12:38:59 AM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> Can something be
> reinforced that you've never seen or done before?

Of course it can.  A good reinforcement learning machine is shaping classes
of behaviors as it learns.  It's not learning specific reactions.  It does
it by shaping the operation of a classifier as it learns.  All possible
stimulus inputs then are guaranteed to fall into some class so that the
system will always have an "answer" as to how to respond.  The answer will
be based on the reinforcement learning systems evaluation of what class the
current situation falls into, and on the systems past experience with other
events that might have been different, but yet fall into the same
classifications.

You can see one implementation of this in action in TD-Gammon.  Each move
which gets reinforced shapes the weights of the neural network which causes
many other similar moves to be reinforced at the same time.  It doesn't
have to see every move, to be able to make a good "guess" at how to respond
to a move.  It has a good (for Backgammon) system for classifying moves
into response classes so that it can successfully merge it's learning from
other moves, to make a good guess at how to play a position it has never
seen before.

This power to correct make a "good" guess for situations never seen is the
one key missing piece in general reinforcement learning systems.  How it
does it is easy to understand in theory - it simply needs a system that
automatically creates a closeness function and produces an answer which is
some type of merging and selecting, from the situations it has seen.  But
how you do this so that a generic system of measuring "closeness" (one not
hand tuned to the application like it was in TD-Gammon), to do a good job
is the hard question that has not been well answered.

> Also, there are some neural nets that do "1-pass" learning. Is this
> reinforcement?
>
> Also, Edelman would probably saw something like behaviors are
> selected for via internal mechanisms. IOW, any given stimulus might
> elicit any #of potential behavioral responses, but only one of these
> ends up being selected for execution. Certainly this happens when
> you search for the proper word to stick into a sentence. Internally,
> many words are filtered past before one is finally spoken. And then
> of course you have to option to stop saying the word even while it's
> being spoken, if it's not the right selection. Plus, there are multiple
> options for how the word is spoken, emphasis, inflection, etc.
>
> Reinforcement learning is only part of the system.

And where is your evidence to show that all those "options" are not
selected for by the same low level reinforcement learning system?

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/17/2006 1:10:31 AM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > Can something be
> > reinforced that you've never seen or done before?
>
> Of course it can.  A good reinforcement learning machine is
> shaping classes of behaviors as it learns.  It's not learning
> specific reactions.  It does it by shaping the operation of a
> classifier as it learns.  All possible stimulus inputs then
> are guaranteed to fall into some class so that the system will
> always have an "answer" as to how to respond.  The answer will
> be based on the reinforcement learning systems evaluation of
> what class the current situation falls into, and on the systems
> past experience with other events that might have been different,
> but yet fall into the same classifications.


But the problem is what kind of system is capable of classifying
inputs in a useful way in the first place to be reinforced?


> You can see one implementation of this in action in TD-Gammon.
> Each move which gets reinforced shapes the weights of the neural
> network which causes many other similar moves to be reinforced
> at the same time.  It doesn't have to see every move, to be able
> to make a good "guess" at how to respond to a move.  It has a
> good (for Backgammon) system for classifying moves into response
> classes so that it can successfully merge it's learning from
> other moves, to make a good guess at how to play a position it
> has never seen before.


So your problem as you allude to below is how to make a system
that can learn a good scheme for classifying whatever is useful
for it to classify.

Adjusting the values of the parameters of some learning scheme
may work fine for some programmer invented scheme but it doesn't
represent open ended general purpose scheme learning.

You talk about reinforcement learning but you cannot reinforce
something that doesn't exist. The learning must take place for
it to be reinforced. If the animal never makes the connections
they can never be reinforced. If the classification is never
made the classifier can never be reinforced.



> This power to correct makes a "good" guess for situations never
> seen is the one key missing piece in general reinforcement
> learning systems.  How it does it is easy to understand in theory
> - it simply needs a system that automatically creates a closeness
> function and produces an answer which is some type of merging and
> selecting, from the situations it has seen.  But how you do this
> so that a generic system of measuring "closeness" (one not hand
> tuned to the application like it was in TD-Gammon), to do a good
> job is the hard question that has not been well answered.


I think there are a lot of things that have not been well answered.

There may be no generic measuring of closeness, it may all be relative
to the mechanisms doing the classifying.

When you say two things are similar they are only similar with
respect to some mechanism. And different mechanisms will classify
things differently. If you shake some dirt through a sieve it will
classify some particles as big and another lot as small but the
threshold value is sieve dependent.

With a visual input you shake your pixels through some "sieve"
and hopefully out will come all the "objects" in the image. But
what constitutes an "object" is also sieve dependent.

An example is my target program that allows a target pattern to
fall through but excludes most other things. Of course some other
things might fall through, no pattern recognizer is perfect, even
we make classification mistakes if we are given marginal data.

What I would suggest is evolution generated random neural sieves
which made random classification rules which in turn were linked
to random classes of behavior generators all of which either
enhanced of reduced the chances of the animals reproductive success.

Human intellectual power may not be the result of a generic learning
system but rather as a set of innate learning instincts. Just as
a young migratory bird has the instinct to learn the pattern of stars
relative to the point around they appear to rotate each night we
have the language learning instinct and the reasoning instinct.


--
JC

0
Reply JGCASEY 7/17/2006 3:44:01 AM

J.A. Legris wrote:

> Glen M. Sizemore wrote:
>> "J.A. Legris" <jalegris@sympatico.ca> wrote in message
>> news:1153070931.890020.86590@m73g2000cwd.googlegroups.com...
>> >
>> > Jim Bromer wrote:
>> >>
>> >> Imagine a complicated production process that had all kinds of sensors
>> >> and alarms. ...

>> > What you've described so far sounds like the Bayesian model that
>> > Michael Olea has been describing, where an estimate of the posterior
>> > probability of an event is updated afer each observation of the
>> > evidence.

>> This strikes me as enormously charitable. It seems to me that he has said
>> little else than:

>> 1.)    We may be able to predict some events if we have access to some
>> part of what has happened.

>> 2.)    We should build a machine that does that.

> Maybe I should have said "sounds consistent with" instead of "sounds
> like", but what grabbed me was the idea that his AI should get
> incrementally better at making predictions with repeated exposures to
> informative data. Bayesian probability suggests a "machine" for
> carrying this out.

A little context:

Bromer on Bayesian Inference:

"Bayesian Networks have been around for at least 15 years. If the use 
of a Bayesian Network had been the key to solving the mysteries of 
higher intelligence, then 15 years of efforts in a multi-trillion 
dollar industry whose products are used as fundamental tools of 
science, technology, education, business, recreation and finance should 
have been sufficient to prove it."

This may offer incidental insight into Bromer's grasp of the syllogism form.

Later he writes:

"Suppose the most likely probability at some Bayesian node is .6.  That
means any likely first selection is going to have at least a .4 probability 
of being wrong.  Thus, the analysis of the most probable context is 
going to lead to an error 4 out of 10 times.  The problem with this is 
that under this case the decision network is no longer operating under 
the principle of automated rationality, since it has come to an invalid 
conclusion.  So the basis of the strongest reason for using a Bayesian 
method is, in many actual cases, going to prove to be invalid."

This may offer incidental insight into Bromer's grasp of decision making
under conditions of uncertainty, not to mention Bromer's tenuous grip on
coherence, let alone any notion he may have of equilibrium distributions
over conditionally independent random variables.

Bromer on Behaviorism:

"Since I am not an expert in Behaviorism, I would be at a disadvantage."

Followed shortly by:

"I feel that Behaviorism produced some insights of value, but it was
severely limited by its methodological constraints..."

This may offer incidental insight into Bromer's willingness to pontificate
on things about which he knows little, if anything.

Bromer on Information Theory:

"I originally was only criticizing the wording of the definition of 
"the representational capacity of a string" that someone gave in 
another discussion group, but when I began to examine the problem more 
closely I was surprised by the subtleties and complications of the concept."

That "someone" was me. Bromer had described Shannon's information theory as
simplistic. He argued that he could pack an arbitrary amount of information
into a single bit. A bit string one bit long could convey more than two
messages, he claimed. His proof was this: suppose I send the bit on a
yellow piece of paper, decode that into two messgaes. Now suppose I send
the bit on a white piece of paper, decode that into two other messgaes.
See, that one bit has sent 4 messages!

This may offer incidental insight into Bromer's grasp of information theory.

Bromer on Computational complexity:

NP-complete theory is all wrong, because Bromer has written a linear time
algorithm to solve the traveling salesman problem. Ok, there are some cases
in which it does not work, but really, with a few fixes, it should work.

-- Michael

 
 
0
Reply Michael 7/17/2006 5:15:21 AM

"JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
> Curt Welch wrote:
> > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > > Can something be
> > > reinforced that you've never seen or done before?
> >
> > Of course it can.  A good reinforcement learning machine is
> > shaping classes of behaviors as it learns.  It's not learning
> > specific reactions.  It does it by shaping the operation of a
> > classifier as it learns.  All possible stimulus inputs then
> > are guaranteed to fall into some class so that the system will
> > always have an "answer" as to how to respond.  The answer will
> > be based on the reinforcement learning systems evaluation of
> > what class the current situation falls into, and on the systems
> > past experience with other events that might have been different,
> > but yet fall into the same classifications.
>
> But the problem is what kind of system is capable of classifying
> inputs in a useful way in the first place to be reinforced?

True.  That's what I'm working on.  What is the correct general way to
classify?  That's the question.

The type of work that Jeff Hawkins is funding with his invariant
representation ideas is part of that answer.  It's what I was posting about
just recently asking for algorithms that can transform signals and remove
correlations without removing other information.  This is all key to
answering how you would build a useful general purpose classifier system.
The invariant signals such a system would generate are the classifications
that are needed to drive the reinforcement learning.

> > You can see one implementation of this in action in TD-Gammon.
> > Each move which gets reinforced shapes the weights of the neural
> > network which causes many other similar moves to be reinforced
> > at the same time.  It doesn't have to see every move, to be able
> > to make a good "guess" at how to respond to a move.  It has a
> > good (for Backgammon) system for classifying moves into response
> > classes so that it can successfully merge it's learning from
> > other moves, to make a good guess at how to play a position it
> > has never seen before.
>
> So your problem as you allude to below is how to make a system
> that can learn a good scheme for classifying whatever is useful
> for it to classify.

Right.  But there are generic answers, as I talked about above, that are
always useful.

> Adjusting the values of the parameters of some learning scheme
> may work fine for some programmer invented scheme but it doesn't
> represent open ended general purpose scheme learning.

But the type of ideas above do represent open ended learning schemes.

> You talk about reinforcement learning but you cannot reinforce
> something that doesn't exist. The learning must take place for
> it to be reinforced.

I have no clue what that means.  The act of reinforcement is the learning.
To me, you just wrote, "the reinforcement must take place for the
reinforcement to take place" ???

> If the animal never makes the connections
> they can never be reinforced. If the classification is never
> made the classifier can never be reinforced.

I'm not sure what you are thinking.  My pulse sorting net already shows
exactly how this type of system can work.  Ever node is already a
classifier.  They all function as classifiers from the beginning.  The only
behavior of the network is a classification function.  Each pulse, as it
enters the net, is classified each time it is sorted by a node into one of
the two output paths.  The net as a whole, is acting as a classified as it
sorts an input pulse, though some selected path, to some final output.

What such a system then learns, is how to adjust the boundary conditions of
all these classifiers.

So, when you say, "if the animal never learns the connection" that makes no
sense for this type of system.  Every behavior is already a "connection"
from one classification to the next.  The network is nothing by
classification connections.

This type of system never has to learn a new classification.  It's born
with all the classifications it can ever make.  Instead, it simply must
learn to adjust the ones it already has, to shape them into the most useful
shape.  This prevents it from ever having to find a new classification like
finding a needle in a haystack.

The other issue, is that all sorting ends up being probabilistic in nature.
It's all "fuzzy" sorting because in no case will a flow be identified and
all pulses sorting on the exact same path.  Instead, low level noise in all
the signals will cause pulses from a single source to fan out in a tree
fashion.  This is the default starting behavior.  Learning will tend to
cluster the pulses where they are best used, but some pulses will still go
every which way.  Just as in the 90/10 behavior selection Michael and I
just wrote about.

This means that if there is a different clustering, that works better,
these alternate sorted nodes will cause that alternate path to be
reinforced, and in time, it will become the 90, instead of the 10, path.

All the above is already works in my pulse sorting network, so there's no
doubt of what this allows.

But, there's a huge But lurking here.  The question is can you find a
generic classification technique, that works like my current pulse sorting
network, which has the correct power, to create any type of classification
needed for some domain (with the interesting domain being full human
behavior).

I had high hopes for the pulse sorting you already know about.  But I now
feel it's close to the right concept, but not quite right.  I don't believe
it was correctly creating invariant representations.  I believe a system
that correctly creates signals which extract invariant representations from
the combined sensory data, and which can otherwise, operate in a fashion
similar to my current pulse sorting nodes, will produce an extremely strong
and extremely generic, reinforcement learning system.

> > This power to correct makes a "good" guess for situations never
> > seen is the one key missing piece in general reinforcement
> > learning systems.  How it does it is easy to understand in theory
> > - it simply needs a system that automatically creates a closeness
> > function and produces an answer which is some type of merging and
> > selecting, from the situations it has seen.  But how you do this
> > so that a generic system of measuring "closeness" (one not hand
> > tuned to the application like it was in TD-Gammon), to do a good
> > job is the hard question that has not been well answered.
>
> I think there are a lot of things that have not been well answered.
>
> There may be no generic measuring of closeness, it may all be relative
> to the mechanisms doing the classifying.

I think a system that uses the correlations in the signals to extract
invariant representations will create the generic measures of closeness
that is needed.  The measure will effectively be made by the number of
common invariant signals the two states share.

Think of my current pulse sorting network, but where every signal internal
to the network, was a separate invariant representation of some aspect of
the current sensory signals.  They would all be extracted features of the
stimulus signals, and would represent the current state of the environment.

My current pulse sorting system already does the same thing, but I think
it's just using the wrong definition of how it classifies so it's not
creating the correct set of "features".

> When you say two things are similar they are only similar with
> respect to some mechanism. And different mechanisms will classify
> things differently. If you shake some dirt through a sieve it will
> classify some particles as big and another lot as small but the
> threshold value is sieve dependent.

Yes.  That's the important question.  Is one mechanism better than all
others, or do we simply have to hand-tune every mechanism to fit the
problem at hand.  Evolution surely has the power to hand tune us to our
environment if that is what was needed.

But, I think that a system that creates invariant signals by removing
correlations (duplicate information) from the signals, is a clear win in
terms of how it will improve the quality of the learning.  Is it good
enough to be a strong generic system?  I don't know.

> With a visual input you shake your pixels through some "sieve"
> and hopefully out will come all the "objects" in the image. But
> what constitutes an "object" is also sieve dependent.

I don't believe it is.  I believe objects are defined by the correlations
they create in the data.  I've always believed this is how our brain works.
We learn to see things as objects, because of the fact they create
correlated effects in our sensory data.  This is something that happens in
our brain, invisible to our conscious understand it's happening.  We just
see things as objects, and have no clue why they seem to be "objects".  We
just take for fact they are objects - until you sit down and try to write
image parsing code and find the definition of what makes an "object" not
obvious at all.

Networks that extract invariant representations based on correlations
present in the data will be extracting the one and only correct definition
of what an object is.

If you remember my arguments as to why man has always seen mental activity
as separate from physical activity, it was based on this same belief of
mine.  We see the sensory data that we call "mental activity" as a
different "object" from the physical objects because there is NO
CORRELATION, in the data between physical sensory data, and mental sensory
data.  That is why the brain ended up classifying all our mental activity,
as a different type of object, from all our physical activity.  They were
NOT CORRELATED.

And why were there no correlations?  Because when we have thoughts, it
makes no sound, no smell, no flash of light, or no tactile sensation.  We
can't see, hear, smell, or tough, our thoughts.  There is NO CORRELATION in
the sensory data streams.  And, with no correlations, our object
classifier, can't create a single invariant relationship by extracting the
correlations.  So, net result, the brain tells us that our mental activity
is a different object, from all physical activity.

> An example is my target program that allows a target pattern to
> fall through but excludes most other things. Of course some other
> things might fall through, no pattern recognizer is perfect, even
> we make classification mistakes if we are given marginal data.

Right.  But if you kept exposing the same target to one of these
classification networks, it would form an invariant representation of the
target as as single object.  That is because the characters that make up
the target create correlations in the various sensory input signals.  If
pixel X from the right side of your target is turned on, then pixel Y from
the left side is also turned on.  This is a correlation condition that
indicates the "target" is present when it happens.  It's that correlation a
generic network could notice, and translate into the "target" invariant
representation.

This is a valid generic technique that applies to all types of sensory
data.

> What I would suggest is evolution generated random neural sieves
> which made random classification rules which in turn were linked
> to random classes of behavior generators all of which either
> enhanced of reduced the chances of the animals reproductive success.

If that is the only solution possible, then that would be a good guess.
But we have the correlations in the data to work with, and so did
evolution.

> Human intellectual power may not be the result of a generic learning
> system but rather as a set of innate learning instincts.

Yeah, it could be.  But, as I have pointed out countless times.  Humans
have GENERIC LEARNING POWERS.  This is a simple proven fact.  And the only
way to explain it is to build a generic learning system.  Most of what we
learn, in our world today, to get by, could not have been done with innate
learning instincts shaped by evolution.  Evolution could not have shaped a
generic "learn to play chess" sieve to allow us to correct see "chess game
patterns".  It could not have also created a "go learning sieve", and a
"bike riding sieve" for learning to recognize the correct response classes
for balancing a bike.

No matter how many specialized systems we might also posses created for us
by evolution, we know for a fact that humans have generic learning skills
that no machine has ever equaled.  So we know for fact, there must exist
stronger generic solutions that we have not yet found.  Our best generic
learning systems can't come close to touching a human yet.  Humans use one,
and only one, learning system to lean to play chess, and to play go.  Yet
no one has created one learning system on a computer, than play both games
well.  They haven't even created a hand-optimized learning system that
plays go anywhere near the best human players.  So we know there are
generic learning systems stronger than anything we have yet created - and I
know that we have no chance of creating human level AI until we first
create stronger generic learning systems.

> Just as
> a young migratory bird has the instinct to learn the pattern of stars
> relative to the point around they appear to rotate each night we
> have the language learning instinct and the reasoning instinct.

I think it's more likely that we have strong generic learning systems, that
have been optimized, to give us all our various specialized learning skills
in areas like, vision, sound, language, 3D spatial awareness, hand eye
coordination, etc.  We have different chunks of our brain, wired into a
fixed topology and tuned for each task, that allows the skill levels we
have in each area of the various sensory combinations.

And, if you were to build a good invariant signal abstraction system, it
would learn to recognize your targets correctly without you having to
hand-code a recognizer for the purpose.  It would learn it after you
exposed your system to a lot of targets of the same type.  It would learn
it by building a hierarchy of features that were common to all your
targets.  And it would do a good job of recognizing patterns that you
though were close to the same type of target, simply because it shared a
lot of features in common with your target - simply because your brain, is
already using that type of definition of "closeness" for everything it
learns.

This definition of "closeness" I believe is a big thing that Behaviorism
has left unanswered.  The question of how does a rat learn to see a light
as a single stimulus?  How does it prevent from getting a red light
confused with a red wall on the cage for example?  Why doesn't it seem them
both as "a red object", instead of seeing them as different?

There's probably work done in this area that I am not aware of (simply
because there is so much work done I'm not aware of) - but as far as I
know, Behaviorism or other fields, have not told us how to build a strong
generic closeness classifier that correct mimics the ones used by animals.
And without a strong "closeness" classifier, reinforcement learning systems
will always be weak as hell.

But it's clear, that the way it does work, is by recognizing the
correlations that exist between sensory signals and extracting those
correlations as invariant signals.  The fine details of how to correctly do
this however, I don't yet fully see.  But I'm getting close.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/17/2006 8:29:54 AM

Michael Olea wrote:

> A little context:
> ...
> Bromer on Information Theory:
>
> "I originally was only criticizing the wording of the definition of
> "the representational capacity of a string" that someone gave in
> another discussion group, but when I began to examine the problem more
> closely I was surprised by the subtleties and complications of the concept."
>
> That "someone" was me. Bromer had described Shannon's information theory as
> simplistic. He argued that he could pack an arbitrary amount of information
> into a single bit. A bit string one bit long could convey more than two
> messages, he claimed. His proof was this: suppose I send the bit on a
> yellow piece of paper, decode that into two messgaes. Now suppose I send
> the bit on a white piece of paper, decode that into two other messgaes.
> See, that one bit has sent 4 messages!
>
> This may offer incidental insight into Bromer's grasp of information theory.

LOL :)  Well, he wasnt holded back by great amounts of knowledge
there...
He should play poker. With his decoding skills he could make a fortune.

> Bromer on Computational complexity:
>
> NP-complete theory is all wrong, because Bromer has written a linear time
> algorithm to solve the traveling salesman problem. Ok, there are some cases
> in which it does not work, but really, with a few fixes, it should work.

Yes always those damned few cases.  Every first year computer science
student comes up with a solution to make those NPC problems solved in
P-problems. If someone would actually succeed he would be more famous
then Turing and
von Neumann together. Over a 100 years there still we be first years
trying to solve it.

0
Reply bob 7/17/2006 9:52:56 AM

Michael Olea wrote:
> J.A. Legris wrote:
>
> > Glen M. Sizemore wrote:
> >> "J.A. Legris" <jalegris@sympatico.ca> wrote in message
> >> news:1153070931.890020.86590@m73g2000cwd.googlegroups.com...
> >> >
> >> > Jim Bromer wrote:
> >> >>
> >> >> Imagine a complicated production process that had all kinds of sensors
> >> >> and alarms. ...
>
> >> > What you've described so far sounds like the Bayesian model that
> >> > Michael Olea has been describing, where an estimate of the posterior
> >> > probability of an event is updated afer each observation of the
> >> > evidence.
>
> >> This strikes me as enormously charitable. It seems to me that he has said
> >> little else than:
>
> >> 1.)    We may be able to predict some events if we have access to some
> >> part of what has happened.
>
> >> 2.)    We should build a machine that does that.
>
> > Maybe I should have said "sounds consistent with" instead of "sounds
> > like", but what grabbed me was the idea that his AI should get
> > incrementally better at making predictions with repeated exposures to
> > informative data. Bayesian probability suggests a "machine" for
> > carrying this out.
>
> A little context:
>
> Bromer on Bayesian Inference:
[...]
> Bromer on Behaviorism:
[...]
> Bromer on Information Theory:
[...]
> Bromer on Computational complexity:
[...]

Gak! I've been Zicked. 

--
Joe Legris

0
Reply J 7/17/2006 11:41:31 AM

 Curt Welch wrote:
> "JGCASEY" <jgkjcasey@yahoo.com.au> wrote:

> > You talk about reinforcement learning but you cannot reinforce
> > something that doesn't exist. The learning must take place for
> > it to be reinforced.
>
> I have no clue what that means.  The act of reinforcement is the
> learning. To me, you just wrote, "the reinforcement must take
> place for the reinforcement to take place" ???

Not sure what it means myself now I have reread it. I think what
I had in mind was whatever it was that was being reinforced had to
exist in the first place. In other words there must be something
to be reinforced before it can be reinforced.

You can't reinforce a dam that doesn't exist. Reinforcement may be
required for the dam to persist but it doesn't require reinforcement
to exist.

Thus I would say what you see as existing is behaviors and thus
perhaps what you mean by reinforcement learning is the process of
reinforcing behaviors?

Your idea I think is to reinforce the behaviors of a generic input
dependent behaviour generating machine whenever you deem those
behaviors to be "intelligent" with the hope it will persist in
producing such behaviors.

Learning would mean some behaviors are becoming more likely than
other behaviors? To learn a behavior is to make it more likely.
But of course that doesn't answer what makes a behavior intelligent
for you can learn to produce unintelligent behaviours as well.


--
JC

0
Reply JGCASEY 7/17/2006 12:28:47 PM

Curt Welch wrote:
> "JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
> > Curt Welch wrote:
> > > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > > > Can something be
> > > > reinforced that you've never seen or done before?
> > >
> > > Of course it can.  A good reinforcement learning machine is
> > > shaping classes of behaviors as it learns.  It's not learning
> > > specific reactions.  It does it by shaping the operation of a
> > > classifier as it learns.  All possible stimulus inputs then
> > > are guaranteed to fall into some class so that the system will
> > > always have an "answer" as to how to respond.  The answer will
> > > be based on the reinforcement learning systems evaluation of
> > > what class the current situation falls into, and on the systems
> > > past experience with other events that might have been different,
> > > but yet fall into the same classifications.
> >
> > But the problem is what kind of system is capable of classifying
> > inputs in a useful way in the first place to be reinforced?
>
> True.  That's what I'm working on.  What is the correct general way to
> classify?  That's the question.
>
> The type of work that Jeff Hawkins is funding with his invariant
> representation ideas is part of that answer.  It's what I was posting about
> just recently asking for algorithms that can transform signals and remove
> correlations without removing other information.  This is all key to
> answering how you would build a useful general purpose classifier system.
> The invariant signals such a system would generate are the classifications
> that are needed to drive the reinforcement learning.
>


Yes, exactly. This is the sort of "other mechanism" I was alluding to
in my previous post. In the real-world, it takes more than just a
naiive
learning device to crack the nut of intelligence. 50 years of AI and
NN research shows this.

0
Reply feedbackdroid 7/17/2006 3:29:23 PM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > Can something be
> > reinforced that you've never seen or done before?
>
> Of course it can.  A good reinforcement learning machine is shaping classes
> of behaviors as it learns.  It's not learning specific reactions.  It does
> it by shaping the operation of a classifier as it learns.  All possible
> stimulus inputs then are guaranteed to fall into some class so that the
> system will always have an "answer" as to how to respond.  The answer will
> be based on the reinforcement learning systems evaluation of what class the
> current situation falls into, and on the systems past experience with other
> events that might have been different, but yet fall into the same
> classifications.
>


I think you're mixing things here. If your trained machine receives an
input it has "never" seen before, and which is adequately far removed
from the centroid of your training set, it will produce "some"
response,
but it's unlikely to produce the "correct" response. OTOH, if the
"novel"
input is adequately close to one of your training set prototypes, then
it
really isn't novel.


>
> You can see one implementation of this in action in TD-Gammon.  Each move
> which gets reinforced shapes the weights of the neural network which causes
> many other similar moves to be reinforced at the same time.  It doesn't
> have to see every move, to be able to make a good "guess" at how to respond
> to a move.  It has a good (for Backgammon) system for classifying moves
> into response classes so that it can successfully merge it's learning from
> other moves, to make a good guess at how to play a position it has never
> seen before.
>


This makes some sense, but playing Backgammon, or learning the rules
for
other games, is not general intelligence.


>
> This power to correct make a "good" guess for situations never seen is the
> one key missing piece in general reinforcement learning systems.  How it
> does it is easy to understand in theory - it simply needs a system that
> automatically creates a closeness function and produces an answer which is
> some type of merging and selecting, from the situations it has seen.
>


Merging and selecting. Yes. This is one of the "other mechanisms",
besides
just the basic learning device, that I alluded to last time. Additional
structure,
of yet-unknown variety.



>   But
> how you do this so that a generic system of measuring "closeness" (one not
> hand tuned to the application like it was in TD-Gammon), to do a good job
> is the hard question that has not been well answered.
>
> > Also, there are some neural nets that do "1-pass" learning. Is this
> > reinforcement?
> >
> > Also, Edelman would probably saw something like behaviors are
> > selected for via internal mechanisms. IOW, any given stimulus might
> > elicit any #of potential behavioral responses, but only one of these
> > ends up being selected for execution. Certainly this happens when
> > you search for the proper word to stick into a sentence. Internally,
> > many words are filtered past before one is finally spoken. And then
> > of course you have to option to stop saying the word even while it's
> > being spoken, if it's not the right selection. Plus, there are multiple
> > options for how the word is spoken, emphasis, inflection, etc.
> >
> > Reinforcement learning is only part of the system.
>
> And where is your evidence to show that all those "options" are not
> selected for by the same low level reinforcement learning system?
>


Basically, in the observation that 50 years of creation of naiive
learning
devices hasn't solved the problem. Many people, such as Grossberg,
have realized this and have tried to put various forms of additional
structure into their systems, but so far a good general solution hasn't

been found.

0
Reply feedbackdroid 7/17/2006 3:48:47 PM

Michael Olea wrote:

............
>
> Bromer on Computational complexity:
>
> NP-complete theory is all wrong, because Bromer has written a linear time
> algorithm to solve the traveling salesman problem. Ok, there are some cases
> in which it does not work, but really, with a few fixes, it should work.
>


Speaking of which, it's amazing how badly people [ie, computerists]
tend to get side-tracked into rote behavioral approaches regards the
TSP. Meaning, doing it like everybody else.

OTOH, if people follow nature's approach of evolving "good-enough"
rather than "optimal" processes/organisms, the solution CAN be made
many times simpler.

To wit, a 20 city TSP involves something like 10^18 different possible
routes.

the computerist approach:
Wow, impressive! Let's spend our lives trying to find the "optimal"
route out of all that. Let's invent NP-mathematics too.

nature's approach:
Ugh. It would take 1000s of millenia to solve this problem using any
brute-force, general-search, approach. How can we simply the problem?
Maybe a non-optimal but easily-computed solution is good-enough for
what's really important ... survival in the real-world. So, here it is
....

If you (a) first partition city-space into regions of grouped cities,
and then (b) solve the problem for each group, and then (c) adopt a
strategy for going between the groups, then (d) the problem is solvable
in the next couple of seconds [more or less] rather than the rest of
eternity [more or less]. Note that this partitioning is basically the
issue of "modularization", which enormously reduces the size of the
search space for any problem.

For the 20-city TSP, if we break the cities into 5 groups of 4 nearby
cities each, then the total #paths reduces to just 120 total. Down from
10^18. Now, that's really a wow.

There may be more-optimal solutions, but then this one only takes a
fraction of a second to execute. While the computerist's machine is
endlessly chugging away, nature's salesman has run his route and is
already back home having coffee, and talleying up his sales.
Good-enough.

There must be a moral here.

0
Reply feedbackdroid 7/17/2006 5:35:39 PM

"JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
>  Curt Welch wrote:
> > "JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
>
> > > You talk about reinforcement learning but you cannot reinforce
> > > something that doesn't exist. The learning must take place for
> > > it to be reinforced.
> >
> > I have no clue what that means.  The act of reinforcement is the
> > learning. To me, you just wrote, "the reinforcement must take
> > place for the reinforcement to take place" ???
>
> Not sure what it means myself now I have reread it. I think what
> I had in mind was whatever it was that was being reinforced had to
> exist in the first place. In other words there must be something
> to be reinforced before it can be reinforced.

Yeah, and after reading the rest of your post I think I grasp what you were
getting at.

> You can't reinforce a dam that doesn't exist. Reinforcement may be
> required for the dam to persist but it doesn't require reinforcement
> to exist.
>
> Thus I would say what you see as existing is behaviors and thus
> perhaps what you mean by reinforcement learning is the process of
> reinforcing behaviors?
>
> Your idea I think is to reinforce the behaviors of a generic input
> dependent behaviour generating machine whenever you deem those
> behaviors to be "intelligent" with the hope it will persist in
> producing such behaviors.

Sure.  Of course.  Normal ideas of reinforcement.  The system finds food by
random chance and the actions leading up to the find are reinforced and
more likely to be repeated in the future.

> Learning would mean some behaviors are becoming more likely than
> other behaviors?

Right.  That's the bottom line of it all.

> To learn a behavior is to make it more likely.
> But of course that doesn't answer what makes a behavior intelligent
> for you can learn to produce unintelligent behaviours as well.

Well, my belief in reinforcement learning is so basic to all this that I
actually see reinforcement learning as intelligence.  Any behavior that is
learned through reinforcement is intelligent behavior.  So it's impossible
to learn unintelligent behavior (from this way of looking at intelligence).

And, of course, if you define intelligence as full human behavior, then
that means everything that humans do is intelligent as well - by
definition.

This of course is not how we use the word intelligence in normal day to day
talk.  We say people are being stupid and unintelligent when they do things
that a moment of reasoned thought would show us to be a poor choice of
behavior.  However, this would imply that only logical reasoned behavior is
intelligent behavior.  And though that's a fine way to define it for casual
conversation, it is not a definition that gets us very near building human
like machines because humans aren't driven by logic and reason at the
lowest levels.  They are driven by reinforcement learning.  Intelligent
logic and reason only emerges as part of our behavior set over time because
of its great value to us in producing long term rewards.

When you look at intelligence as being reinforcement learning skills then
you can compare the intelligence of different machines by placing them into
the same environment and see which ones produce the most rewards over some
extended period of time.  The better the machine does at adapting it's
behavior to the environment is the more intelligent it is - for that
environment.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/17/2006 7:52:42 PM

feedbackdroid wrote:
 
> Speaking of which, it's amazing how badly people [ie, computerists]
> tend to get side-tracked into rote behavioral approaches regards the
> TSP. Meaning, doing it like everybody else.

Dan, the "engineer", once argued that a wavelet transform could not be
lossless because "any time you add two numbers you lose information". To be
fair, I don't know that Dan has ever actually claimed to be an engineer.

> OTOH, if people follow nature's approach of evolving "good-enough"
> rather than "optimal" processes/organisms, the solution CAN be made
> many times simpler.

What???? You mean there are algorithms that find approximate solutions? Do,
tell. So, would that imply that the study of the run-time complexity of
algorithms should not be limited to the worst case performance of exact
solutions, but include also average run-time, and when exact solutions are,
in a given context, too expensive, the worst case, and expected case of
approximate algorithms, and perhaps even bounds on the difference between
optimal solutions and approximate solutions such an algorithm finds? Wow!
Inform "computerists" everywhere of the stunning news. Also, the allies won
the war - Hitler is dead!!!
 
> To wit, a 20 city TSP involves something like 10^18 different possible
> routes.

It depends on which variant of the problem. In the case where each city is
to be visited exactly once, and cities can be visited in any order, it is:

N!/2 = 1,216,451,004,088,320,000
 
> the computerist approach:
> Wow, impressive! Let's spend our lives trying to find the "optimal"
> route out of all that. Let's invent NP-mathematics too.

Any other group you care to smear while you're at it?

Let's open an introductory text on algorithms and data structures. How many
approximate algorithms for the traveling salesman problem do we find?
There's Prim's algorithm, Kruskal's algorithm, a simple version of
"2-opting", generalized to "k-opting", and of course several pointers to
the literature.

> If you (a) first partition city-space into regions of grouped cities...

Having slandered the study of algorithms and their run-time, Dan sketches a
question-begging "algorithm", offered on the grounds of it's alleged
superior run-time, though punting on the question of how to partition
cities into groups, the run-time of consrtucting that partition, and
ignoring any analysis of bounds on the differences between the solutions
found by this "algorithm" and any optimal solutions. And yet the little
cockroach presumes to lecture "computerists" on their silly rote behavior.

> There must be a moral here.

That you suffer from penis envy? Or is it just a lack of rigor?

-- Michael


0
Reply Michael 7/17/2006 8:19:45 PM

Michael Olea wrote:
> J.A. Legris wrote:
>
> > What you've described so far sounds like the Bayesian model that
> > Michael Olea has been describing, where an estimate of the posterior
> > probability of an event is updated afer each observation of the
> > evidence. Is this the sort of thing you have in mind? At some point,
> > perhaps depending on a threshold probability level, a decision would
> > have to be made about whether the corresponding alarm should be
> > triggered.
>
> That would be where the "utility model" comes in (moving from Bayesian
> Inference into Bayesian Decision Theory) - the cost and gain functions over
> consequences. So you pick the thresold to maximize expected utility. That
> is, of course, a normative theory, not a descriptive one - what an agent
> should do, not what particular agents do in fact do. Even so it is often a
> good model of behavior under experimental conditions. There is a consistent
> difference, I've mentioned a few times, between the normative model and a
> descriptive model of "matching law" like behavior. Suppose you have two
> choices A and B, and that the expected utility is 90 for A and 10 for B.
> The optimal choice is pick A every time. The observed behavior is more like
> pick A 90% of the time, pick B 10% of the time. The discrepancy arises only
> if the probability distribution is known, and stationary. If the
> distribution is unknown (i.e. being estimated, or "learned"), and if it
> might be changing then the matching law makes more sense, has been shown to
> be optimal under some idealized conditions, and is a form of "importance
> sampling", very much like particle filtering methods of approximate
> Bayesian inference.
>
> > It seems like a big jump from predicting outcomes, even thousands of
> > them, to running interactive experiments to test the predictions. How
> > might that work?
>
> That, "intervention", gets a lot of attention in Judea Pearl's second major
> book, the one on "Causality". It also has been studied in terms of "value
> of information". Bayesian medical expert systems do a limited form of this
> by suggesting tests to perform in order to arrive at a diagnosis. The role
> of intervention in learning has also been studied in, for example,
> developmental psychology. Discounting evidence ("let me try it, you just
> aren't doing it right") is one example. It is a major theme in Allison
> Gopnik's work:
>
> http://ihd.berkeley.edu/gopnik.htm
>
> For example:
>
> A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks (2004). A
> theory of causal learning in children: Causal maps and Bayes nets.
> Psychological Review, 111, 1, 1-31.

> T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring hidden
> causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
> Science Society
>
> -- Michael

The experimental results in the first paper (starting on p.64) are
fascinating. Required reading for behaviourists! Thanks for the links.

--
Joe Legris

0
Reply J 7/17/2006 8:23:50 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote in message 
news:1153150163.147035.310330@p79g2000cwp.googlegroups.com...
>
> Curt Welch wrote:
>> "JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
>> > Curt Welch wrote:
>> > > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
>> > > > Can something be
>> > > > reinforced that you've never seen or done before?
>> > >
>> > > Of course it can.  A good reinforcement learning machine is
>> > > shaping classes of behaviors as it learns.  It's not learning
>> > > specific reactions.  It does it by shaping the operation of a
>> > > classifier as it learns.  All possible stimulus inputs then
>> > > are guaranteed to fall into some class so that the system will
>> > > always have an "answer" as to how to respond.  The answer will
>> > > be based on the reinforcement learning systems evaluation of
>> > > what class the current situation falls into, and on the systems
>> > > past experience with other events that might have been different,
>> > > but yet fall into the same classifications.
>> >
>> > But the problem is what kind of system is capable of classifying
>> > inputs in a useful way in the first place to be reinforced?
>>
>> True.  That's what I'm working on.  What is the correct general way to
>> classify?  That's the question.
>>
>> The type of work that Jeff Hawkins is funding with his invariant
>> representation ideas is part of that answer.  It's what I was posting 
>> about
>> just recently asking for algorithms that can transform signals and remove
>> correlations without removing other information.  This is all key to
>> answering how you would build a useful general purpose classifier system.
>> The invariant signals such a system would generate are the 
>> classifications
>> that are needed to drive the reinforcement learning.
>>
>
>
> Yes, exactly. This is the sort of "other mechanism" I was alluding to
> in my previous post. In the real-world, it takes more than just a
> naiive
> learning device to crack the nut of intelligence. 50 years of AI and
> NN research shows this.

But maybe Curt has a point in saying there must be some, or a few simple 
basic RL algorithms that carry the complexities that evolve from them at 
large. Maybe I found them: 1) the "protonic" algorithm of protons, 2) the 
"neutronic" algorithm of neutrons, and 3) the "electronic" algorithm of 
electrons. Duh. 


0
Reply JPl 7/17/2006 8:31:47 PM

Michael Olea wrote:
> feedbackdroid wrote:
>
> > Speaking of which, it's amazing how badly people [ie, computerists]
> > tend to get side-tracked into rote behavioral approaches regards the
> > TSP. Meaning, doing it like everybody else.
>
> Dan, the "engineer", once argued that a wavelet transform could not be
> lossless because "any time you add two numbers you lose information". To be
> fair, I don't know that Dan has ever actually claimed to be an engineer.
>


Umm, don't recall wavelets specifically, but certainly when you add
2 #'s you lose some information.

If you have a 5, was this the result of adding 1+4, or 2+3? Easy case.
Which is it?

Or more realistic, 1.1+3.9, or any of an infinite #other possibilities.


>
> > To wit, a 20 city TSP involves something like 10^18 different possible
> > routes.
>
> It depends on which variant of the problem. In the case where each city is
> to be visited exactly once, and cities can be visited in any order, it is:
>
> N!/2 = 1,216,451,004,088,320,000
>

Hmmm, looks like about 10^18 to me.


>
> > the computerist approach:
> > Wow, impressive! Let's spend our lives trying to find the "optimal"
> > route out of all that. Let's invent NP-mathematics too.
>
> Any other group you care to smear while you're at it?
>


Rote and linear thinkers, maybe? Maybe ducks in a row.
Everybody who sits in the first row in lecture, but never
asks any questions.


>
> > If you (a) first partition city-space into regions of grouped cities...
>
> Having slandered the study of algorithms and their run-time, Dan sketches a
> question-begging "algorithm", offered on the grounds of it's alleged
> superior run-time, though punting on the question of how to partition
> cities into groups, the run-time of consrtucting that partition, and
> ignoring any analysis of bounds on the differences between the solutions
> found by this "algorithm" and any optimal solutions. And yet the little
> cockroach presumes to lecture "computerists" on their silly rote behavior.
>


Awww. Exposing a few raw nerves, are we? You might have that
checked out already .... before it's too late.

Actually, I wasn't punting, I was presenting a general approach. And
when I said grouped cities in part (a), I actually meant to say nearby
cities. The partitioning issue is trivial. Selecting which group to
visit
first and last is also trivial. Get pencil and paper and explore.

BTW, got another algorithm than reduces spearch space from 10^18
to 120? You notice I did say, optimality is traded for efficiency.
Do you think nature tried out all of the 10^18 class solutions before
deciding on a 120-class solution? Well, do ya? Regards survival, it's
a LOT better to be quick than be optimal.

In fact, the more we learn about biology, the more we see that
nature/evolution found all kinds of good short-cut solutions that
worked, and then it CONSERVED them. See Hox genes, and Pax6
genes, for instance.


>
> > There must be a moral here.
>
> That you suffer from penis envy? Or is it just a lack of rigor?
>

I should say calm down, but quite obviously it'll do no good here.

0
Reply feedbackdroid 7/17/2006 9:12:28 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:

> Umm, don't recall wavelets specifically, but certainly when you add
> 2 #'s you lose some information.
>
> If you have a 5, was this the result of adding 1+4, or 2+3? Easy case.
> Which is it?
>
> Or more realistic, 1.1+3.9, or any of an infinite #other possibilities.

Yes, but if you store that lost information somewhere else at the same
time, then it's not lost.  If you compute X=A+B and Y=A-B, then you have
lost information in both operations.  But yet, you can still use X and Y to
recompute A, or B, so nothing was in fact lost if you transform A and B,
into X and Y in this way.  What was lost in each operation was saved, in
the other. This is true of all linear transforms where the transformation
matrix is invertible.

There are many known transformations that loose information in their
individual operations, but which collectively, manage to retain all the
information. (FFTs, lossless compression, etc).

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/17/2006 9:48:40 PM

JC wrote:

>> To learn a behavior is to make it more likely.
>> But of course that doesn't answer what makes a behavior
>> intelligent for you can learn to produce unintelligent
>> behaviours as well.

Curt Welch wrote:

> Well, my belief in reinforcement learning is so basic to
> all this that I actually see reinforcement learning as
> intelligence.  Any behavior that is learned through
> reinforcement is intelligent behavior.  So it's impossible
> to learn unintelligent behavior (from this way of looking
> at intelligence).
>
> And, of course, if you define intelligence as full human
> behavior, then that means everything that humans do is
> intelligent as well - by definition.
>
> This of course is not how we use the word intelligence in
> normal day to day talk.  We say people are being stupid and
> unintelligent when they do things that a moment of reasoned
> thought would show us to be a poor choice of behavior.

Then I would say that it is the "reasoned thought" behavior
that is being looked for. Do we really need the hanger on
destructive behaviors in a machine just because they served
us well as tribal people?  Or is the ability to wipe out
the competition intelligent behavior? When we don't have a
war to fight we create one in the form of football to fill
in that need. Or play computer games where you can go to
war "killing" virtual characters. sure they are both games
but why do they stimulate the pleasure centers in most males?
In human societies i sometimes think perhaps our full
intelligent combined behavior is spread between the sexes.

This would forebode that an intelligent machine would, as an
act of intelligence, think about wiping us out or at least
most of us as it might keep some alive the way we do in zoos.
The typical science fiction scenario might be based on a
reality or intuition about what it means to be intelligent.
Look at the kinds of movie fiction that is most popular for
I suspect its content has a lot to say about our need for
conflict and successful resolution.

> However, this would imply that only logical reasoned behavior
> is intelligent behavior.  And though that's a fine way to
> define it for casual conversation, it is not a definition
> that gets us very near building human like machines because
> humans aren't driven by logic and reason at the lowest levels.
>
> They are driven by reinforcement learning.  Intelligent
> logic and reason only emerges as part of our behavior set
> over time because of its great value to us in producing
> long term rewards.
>
> When you look at intelligence as being reinforcement learning
> skills then you can compare the intelligence of different
> machines by placing them into the same environment and see
> which ones produce the most rewards over some extended period
> of time.  The better the machine does at adapting it's behavior
> to the environment is the more intelligent it is - for that
> environment.

0
Reply JGCASEY 7/17/2006 10:07:10 PM

I looked over the paper (no, I didn't read it), and my first impression is 
that this is not "must" reading for behaviorists. Or rather, it is far less 
"must" reading than some of the tutorials on Bayesian analyses of coin 
tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do you 
think it is "must" reading for behaviorists? Pitch me. After all, you can 
argue that I can't be persuaded, but you know that you can get a rise out of 
me.



"J.A. Legris" <jalegris@sympatico.ca> wrote in message 
news:1153167830.117066.205590@i42g2000cwa.googlegroups.com...
>
> Michael Olea wrote:
>> J.A. Legris wrote:
>>
>> > What you've described so far sounds like the Bayesian model that
>> > Michael Olea has been describing, where an estimate of the posterior
>> > probability of an event is updated afer each observation of the
>> > evidence. Is this the sort of thing you have in mind? At some point,
>> > perhaps depending on a threshold probability level, a decision would
>> > have to be made about whether the corresponding alarm should be
>> > triggered.
>>
>> That would be where the "utility model" comes in (moving from Bayesian
>> Inference into Bayesian Decision Theory) - the cost and gain functions 
>> over
>> consequences. So you pick the thresold to maximize expected utility. That
>> is, of course, a normative theory, not a descriptive one - what an agent
>> should do, not what particular agents do in fact do. Even so it is often 
>> a
>> good model of behavior under experimental conditions. There is a 
>> consistent
>> difference, I've mentioned a few times, between the normative model and a
>> descriptive model of "matching law" like behavior. Suppose you have two
>> choices A and B, and that the expected utility is 90 for A and 10 for B.
>> The optimal choice is pick A every time. The observed behavior is more 
>> like
>> pick A 90% of the time, pick B 10% of the time. The discrepancy arises 
>> only
>> if the probability distribution is known, and stationary. If the
>> distribution is unknown (i.e. being estimated, or "learned"), and if it
>> might be changing then the matching law makes more sense, has been shown 
>> to
>> be optimal under some idealized conditions, and is a form of "importance
>> sampling", very much like particle filtering methods of approximate
>> Bayesian inference.
>>
>> > It seems like a big jump from predicting outcomes, even thousands of
>> > them, to running interactive experiments to test the predictions. How
>> > might that work?
>>
>> That, "intervention", gets a lot of attention in Judea Pearl's second 
>> major
>> book, the one on "Causality". It also has been studied in terms of "value
>> of information". Bayesian medical expert systems do a limited form of 
>> this
>> by suggesting tests to perform in order to arrive at a diagnosis. The 
>> role
>> of intervention in learning has also been studied in, for example,
>> developmental psychology. Discounting evidence ("let me try it, you just
>> aren't doing it right") is one example. It is a major theme in Allison
>> Gopnik's work:
>>
>> http://ihd.berkeley.edu/gopnik.htm
>>
>> For example:
>>
>> A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks (2004). 
>> A
>> theory of causal learning in children: Causal maps and Bayes nets.
>> Psychological Review, 111, 1, 1-31.
>
>> T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring hidden
>> causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
>> Science Society
>>
>> -- Michael
>
> The experimental results in the first paper (starting on p.64) are
> fascinating. Required reading for behaviourists! Thanks for the links.
>
> --
> Joe Legris
> 


0
Reply Glen 7/17/2006 10:12:03 PM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
>
> > Umm, don't recall wavelets specifically, but certainly when you add
> > 2 #'s you lose some information.
> >
> > If you have a 5, was this the result of adding 1+4, or 2+3? Easy case.
> > Which is it?
> >
> > Or more realistic, 1.1+3.9, or any of an infinite #other possibilities.
>
> Yes, but if you store that lost information somewhere else at the same
> time, then it's not lost.  If you compute X=A+B and Y=A-B, then you have
> lost information in both operations.  But yet, you can still use X and Y to
> recompute A, or B, so nothing was in fact lost if you transform A and B,
> into X and Y in this way.  What was lost in each operation was saved, in
> the other. This is true of all linear transforms where the transformation
> matrix is invertible.

Something like the brain does when the what and the where of
visual data go to different parts of the brain to be processed.
The absolute data is of no use to recogizing the what (temporal
lobe) but it is required to determine the spatial relationships of
the what (parietal lobe).

--
JC

0
Reply JGCASEY 7/17/2006 10:13:20 PM

Oops. I forgot to say that the paper I looked at was the "Inferring Hidden 
Causes" mama.

G.


"Glen M. Sizemore" <gmsizemore2@yahoo.com> wrote in message 
news:44bc0ae6$0$2491$ed362ca5@nr1.newsreader.com...
>I looked over the paper (no, I didn't read it), and my first impression is 
>that this is not "must" reading for behaviorists. Or rather, it is far less 
>"must" reading than some of the tutorials on Bayesian analyses of coin 
>tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do you 
>think it is "must" reading for behaviorists? Pitch me. After all, you can 
>argue that I can't be persuaded, but you know that you can get a rise out 
>of me.
>
>
>
> "J.A. Legris" <jalegris@sympatico.ca> wrote in message 
> news:1153167830.117066.205590@i42g2000cwa.googlegroups.com...
>>
>> Michael Olea wrote:
>>> J.A. Legris wrote:
>>>
>>> > What you've described so far sounds like the Bayesian model that
>>> > Michael Olea has been describing, where an estimate of the posterior
>>> > probability of an event is updated afer each observation of the
>>> > evidence. Is this the sort of thing you have in mind? At some point,
>>> > perhaps depending on a threshold probability level, a decision would
>>> > have to be made about whether the corresponding alarm should be
>>> > triggered.
>>>
>>> That would be where the "utility model" comes in (moving from Bayesian
>>> Inference into Bayesian Decision Theory) - the cost and gain functions 
>>> over
>>> consequences. So you pick the thresold to maximize expected utility. 
>>> That
>>> is, of course, a normative theory, not a descriptive one - what an agent
>>> should do, not what particular agents do in fact do. Even so it is often 
>>> a
>>> good model of behavior under experimental conditions. There is a 
>>> consistent
>>> difference, I've mentioned a few times, between the normative model and 
>>> a
>>> descriptive model of "matching law" like behavior. Suppose you have two
>>> choices A and B, and that the expected utility is 90 for A and 10 for B.
>>> The optimal choice is pick A every time. The observed behavior is more 
>>> like
>>> pick A 90% of the time, pick B 10% of the time. The discrepancy arises 
>>> only
>>> if the probability distribution is known, and stationary. If the
>>> distribution is unknown (i.e. being estimated, or "learned"), and if it
>>> might be changing then the matching law makes more sense, has been shown 
>>> to
>>> be optimal under some idealized conditions, and is a form of "importance
>>> sampling", very much like particle filtering methods of approximate
>>> Bayesian inference.
>>>
>>> > It seems like a big jump from predicting outcomes, even thousands of
>>> > them, to running interactive experiments to test the predictions. How
>>> > might that work?
>>>
>>> That, "intervention", gets a lot of attention in Judea Pearl's second 
>>> major
>>> book, the one on "Causality". It also has been studied in terms of 
>>> "value
>>> of information". Bayesian medical expert systems do a limited form of 
>>> this
>>> by suggesting tests to perform in order to arrive at a diagnosis. The 
>>> role
>>> of intervention in learning has also been studied in, for example,
>>> developmental psychology. Discounting evidence ("let me try it, you just
>>> aren't doing it right") is one example. It is a major theme in Allison
>>> Gopnik's work:
>>>
>>> http://ihd.berkeley.edu/gopnik.htm
>>>
>>> For example:
>>>
>>> A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks 
>>> (2004). A
>>> theory of causal learning in children: Causal maps and Bayes nets.
>>> Psychological Review, 111, 1, 1-31.
>>
>>> T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring 
>>> hidden
>>> causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
>>> Science Society
>>>
>>> -- Michael
>>
>> The experimental results in the first paper (starting on p.64) are
>> fascinating. Required reading for behaviourists! Thanks for the links.
>>
>> --
>> Joe Legris
>>
>
> 


0
Reply Glen 7/17/2006 10:20:44 PM

Curt Welch wrote:

> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> 
>> Umm, don't recall wavelets specifically, but certainly when you add
>> 2 #'s you lose some information.
>>
>> If you have a 5, was this the result of adding 1+4, or 2+3? Easy case.
>> Which is it?
>>
>> Or more realistic, 1.1+3.9, or any of an infinite #other possibilities.
 
> Yes, but if you store that lost information somewhere else at the same
> time, then it's not lost.  If you compute X=A+B and Y=A-B, then you have
> lost information in both operations.  But yet, you can still use X and Y
> to recompute A, or B, so nothing was in fact lost if you transform A and
> B,
> into X and Y in this way.  What was lost in each operation was saved, in
> the other. This is true of all linear transforms where the transformation
> matrix is invertible.
 
> There are many known transformations that loose information in their
> individual operations, but which collectively, manage to retain all the
> information. (FFTs, lossless compression, etc).

====================================================================
"Also, some people like to use the modern Gabor wavelet, mainly I
think, because it is more limited in space than a fourier grating.
To me this doesn't really mean anything, because you've basically
selected a mathematical shape that matches the cell response, which
is mainly a result of the anatomy. IOW, Gabor is a computational
method, but not anything very profound, IMO."
 
-- Dan Michaels


"Mathematically, the 2D Gabor function achieves the resolution
limit in the conjoint space only in its complex form. Since a
complex valued 2D Gabor function contains in quadrature projection
an even-symetric cosine component and an odd-symmetric sine
component, Pollen and Ronner's finding that simple cells exist
in quadrature-phase pairs therefore showed that the design of
the cells might indeed be optimal. The fact that the visual
cortical cell has evolved to an optimal design for information
encoding has caused a considerable amount of excitement not
only in the neuroscience community but in the computer science
community as well."

-- Tai Sing Lee [2]

"Besides everything I just wrote, I should reiterate that I think
viewing all these happenings as a "lossless" process is really a
misnomer. What is really going on are successive transforms of the
sensory data. If you have something like

 Ce <- Si <- Wi

and the sums of gaussians, how can this possibly be lossless? When
you add 2 #'s you lose information, namely the original values of those
2 #'s"

-- Dan Michaels

"In this paper, we have derived, based on physiological constraints
and the wavelet theory, a family of 2D Gabor wavelets which model the
receptive fields of the simple cells in the brain's primary visual
cortex. By generalizing Daubechies's frame criteria to 2D, we
established the conditions under which a discrete class of continous
Gabor wavelets will provide complete representation of any image. ..."

-- Tai Sing Lee [2]

"... Well, I can't resist making one observation: of course the
transformation

 (x,y) -> x+y

discards information. But that is not what is involved in projection onto a
wavelets basis. A better analogy would be:

 (x,y) -> (x+y, x-y)

which is lossless. As to why such a transform might be advantageous:

[1] Field. Wavelets, vision and the statistics of natural scenes.
[2] Lee. Image Representation Using 2D Gabor Wavelets."
====================================================================

The above are remarks I made 22 Nov 2005.

All the engineers I know, several, including relatives, friends, and
colleagues, had to pass elementary linear algebra to get their degrees.
Of course, with wavelets and Fourier analysis the vectors are functions, and
the vector spaces are infinite dimensional function spaces, not much
covered in a first course on linear algebra. They become finite dimensional
again in the discrete domain of the DFFT or the varioius DWT's.

-- Michael
0
Reply Michael 7/17/2006 11:01:36 PM

J.A. Legris wrote:

> Michael Olea wrote:

>> ... It is a major theme in
>> Allison Gopnik's work:
>>
>> http://ihd.berkeley.edu/gopnik.htm
>>
>> For example:
>>
>> A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks (2004).
>> A theory of causal learning in children: Causal maps and Bayes nets.
>> Psychological Review, 111, 1, 1-31.
> 
>> T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring hidden
>> causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
>> Science Society

> The experimental results in the first paper (starting on p.64) are
> fascinating. Required reading for behaviourists! Thanks for the links.

You're welcome.

-- Michael
0
Reply Michael 7/17/2006 11:03:10 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> Curt Welch wrote:
> > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > > Can something be
> > > reinforced that you've never seen or done before?
> >
> > Of course it can.  A good reinforcement learning machine is shaping
> > classes of behaviors as it learns.  It's not learning specific
> > reactions.  It does it by shaping the operation of a classifier as it
> > learns.  All possible stimulus inputs then are guaranteed to fall into
> > some class so that the system will always have an "answer" as to how to
> > respond.  The answer will be based on the reinforcement learning
> > systems evaluation of what class the current situation falls into, and
> > on the systems past experience with other events that might have been
> > different, but yet fall into the same classifications.
> >
>
> I think you're mixing things here. If your trained machine receives an
> input it has "never" seen before, and which is adequately far removed
> from the centroid of your training set, it will produce "some"
> response, but it's unlikely to produce the "correct" response. OTOH, if
> the "novel" input is adequately close to one of your training set
> prototypes, then it really isn't novel.

Yes, but you didn't include some distant measure in your question.  You
simply said something that hasn't been seen or done before.  The answer is
that it is always advantageous for the learning system to include an
inherent system of measuring closeness and to use that, to guide it's
selection of behaviors in situations it has never seen before.  An educated
guess is almost always better than a random guess even if it produces an
answer which is far from optimal.

A big point about all practical reinforcement learning problems is that
there never is a correct answer. All answers tend instead to be graded on a
scale of quality.  Jumping off a cliff and killing yourself is not the
"wrong" answer - it simply produces far less rewards, than choosing to eat
dinner instead.  It's not as bad an answer as slowly pealing your skin off
and feeding it to the pigs until you die. :)

This is what makes it very important in reinforcement learning to use
measures of closeness to select behaviors based no past experience when
presented with a novel stimulus.  All real reinforcement problems, tend to
have the property that similar behaviors tend to produce similar results in
similar situations.  So no matter how novel a new situation is, it's always
wise for the system to try what it believes is the best behavior based on
its best current understanding of similar.

It's understanding of similar (its system for measuring closeness),
likewise, needs to be trained by past experience as well. If the system has
learned to ignore the blue light because it makes no difference in the
optimal selection of behaviors in the conditions previously experienced,
then when a new stimulus experience is seen, far from past experience, then
assuming the blue light should still be ignored, is a good first guess.

The point is, if you have a fixed number of sensory inputs, then all input
values will have been seen in the past.  After a short period of training,
the system is not expected to see anything truly "new".  It's only expected
to see new combinations it has not yet seen.  But those combinations will
always, share in common, traits of past combinations, so it should use
those shared traits as a guide to selecting behaviors based on past
training.  In a correctly designed system, there will never be truly novel
inputs, after a short bit of training.

> > You can see one implementation of this in action in TD-Gammon.  Each
> > move which gets reinforced shapes the weights of the neural network
> > which causes many other similar moves to be reinforced at the same
> > time.  It doesn't have to see every move, to be able to make a good
> > "guess" at how to respond to a move.  It has a good (for Backgammon)
> > system for classifying moves into response classes so that it can
> > successfully merge it's learning from other moves, to make a good guess
> > at how to play a position it has never seen before.
> >
>
> This makes some sense, but playing Backgammon, or learning the rules
> for other games, is not general intelligence.

But it's getting much closer because TD-gammon was far more generic than
all the same authors past attempts at creating a backgammon game.  So it's
a step in the right direction and it shows how a program can learn complex
things on its own better, than a human can hand-code the same type of
knowledge into a program.

> > This power to correct make a "good" guess for situations never seen is
> > the one key missing piece in general reinforcement learning systems.
> > How it does it is easy to understand in theory - it simply needs a
> > system that automatically creates a closeness function and produces an
> > answer which is some type of merging and selecting, from the situations
> > it has seen.
> >
>
> Merging and selecting. Yes. This is one of the "other mechanisms",
> besides
> just the basic learning device, that I alluded to last time. Additional
> structure,
> of yet-unknown variety.

Well, it's not unknown to me.  I have a good general high level grasp on
exactly what it needs to do, and why, and a bit of how.  There's important
details missing from my understanding, but that's not the same thing as
having no clue at all about what is needed.  I'd say I have an 80%
understanding of exactly what is needed.

> >   But
> > how you do this so that a generic system of measuring "closeness" (one
> > not hand tuned to the application like it was in TD-Gammon), to do a
> > good job is the hard question that has not been well answered.
> >
> > > Also, there are some neural nets that do "1-pass" learning. Is this
> > > reinforcement?
> > >
> > > Also, Edelman would probably saw something like behaviors are
> > > selected for via internal mechanisms. IOW, any given stimulus might
> > > elicit any #of potential behavioral responses, but only one of these
> > > ends up being selected for execution. Certainly this happens when
> > > you search for the proper word to stick into a sentence. Internally,
> > > many words are filtered past before one is finally spoken. And then
> > > of course you have to option to stop saying the word even while it's
> > > being spoken, if it's not the right selection. Plus, there are
> > > multiple options for how the word is spoken, emphasis, inflection,
> > > etc.
> > >
> > > Reinforcement learning is only part of the system.
> >
> > And where is your evidence to show that all those "options" are not
> > selected for by the same low level reinforcement learning system?
> >
>
> Basically, in the observation that 50 years of creation of naiive
> learning
> devices hasn't solved the problem. Many people, such as Grossberg,
> have realized this and have tried to put various forms of additional
> structure into their systems, but so far a good general solution hasn't
> been found.

Well, it took 120 years to go from balloons to powered controlled sustained
flight.  The fact that we haven't gotten strong general learning working in
50 years doesn't prove much to me.  Especially since I already clearly see
exactly what is missing.  It's just not a mystery to me anymore.  Only the
implementation details are still a mystery.  It's like understanding that
all you need, is the correct configuration plane to create stability and
control and lift, combined with a power source with the correct power to
weight ratio. Knowing this is all you need, is not the same as knowing what
the correct configuration is, or knowing how to build a more powerful,
lighter engine.  But once you understand what's missing, the rest is no
longer a mystery, it's just a job of directed research to fill in the
missing pieces.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/17/2006 11:14:41 PM

Glen M. Sizemore wrote:

> I looked over the paper (no, I didn't read it), and my first impression is
> that this is not "must" reading for behaviorists. Or rather, it is far
> less "must" reading than some of the tutorials on Bayesian analyses of
> coin
> tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do you
> think it is "must" reading for behaviorists? Pitch me. After all, you can
> argue that I can't be persuaded, but you know that you can get a rise out
> of me.

He's probably referring to the differing predictions between RW models,
various Bayesian models, and the experimental results.

I have mixed reactions to papers with Gopnik's name on them - interesting
work, but a tendency towards straw-man caricature of alternatives.

-- Michael
0
Reply Michael 7/17/2006 11:19:35 PM

Glen M. Sizemore wrote:
>
> "J.A. Legris" <jalegris@sympatico.ca> wrote in message
> news:1153167830.117066.205590@i42g2000cwa.googlegroups.com...
> >
> > Michael Olea wrote:
> >> J.A. Legris wrote:
> >>
> >> > What you've described so far sounds like the Bayesian model that
> >> > Michael Olea has been describing, where an estimate of the posterior
> >> > probability of an event is updated afer each observation of the
> >> > evidence. Is this the sort of thing you have in mind? At some point,
> >> > perhaps depending on a threshold probability level, a decision would
> >> > have to be made about whether the corresponding alarm should be
> >> > triggered.
> >>
> >> That would be where the "utility model" comes in (moving from Bayesian
> >> Inference into Bayesian Decision Theory) - the cost and gain functions
> >> over
> >> consequences. So you pick the thresold to maximize expected utility. That
> >> is, of course, a normative theory, not a descriptive one - what an agent
> >> should do, not what particular agents do in fact do. Even so it is often
> >> a
> >> good model of behavior under experimental conditions. There is a
> >> consistent
> >> difference, I've mentioned a few times, between the normative model and a
> >> descriptive model of "matching law" like behavior. Suppose you have two
> >> choices A and B, and that the expected utility is 90 for A and 10 for B.
> >> The optimal choice is pick A every time. The observed behavior is more
> >> like
> >> pick A 90% of the time, pick B 10% of the time. The discrepancy arises
> >> only
> >> if the probability distribution is known, and stationary. If the
> >> distribution is unknown (i.e. being estimated, or "learned"), and if it
> >> might be changing then the matching law makes more sense, has been shown
> >> to
> >> be optimal under some idealized conditions, and is a form of "importance
> >> sampling", very much like particle filtering methods of approximate
> >> Bayesian inference.
> >>
> >> > It seems like a big jump from predicting outcomes, even thousands of
> >> > them, to running interactive experiments to test the predictions. How
> >> > might that work?
> >>
> >> That, "intervention", gets a lot of attention in Judea Pearl's second
> >> major
> >> book, the one on "Causality". It also has been studied in terms of "value
> >> of information". Bayesian medical expert systems do a limited form of
> >> this
> >> by suggesting tests to perform in order to arrive at a diagnosis. The
> >> role
> >> of intervention in learning has also been studied in, for example,
> >> developmental psychology. Discounting evidence ("let me try it, you just
> >> aren't doing it right") is one example. It is a major theme in Allison
> >> Gopnik's work:
> >>
> >> http://ihd.berkeley.edu/gopnik.htm
> >>
> >> For example:
> >>
> >> A.Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, & D. Danks (2004).
> >> A
> >> theory of causal learning in children: Causal maps and Bayes nets.
> >> Psychological Review, 111, 1, 1-31.
> >
> >> T. Kushnir, A. Gopnik, L Schulz, & D. Danks. (in press). Inferring hidden
> >> causes. Proceedings of the Twenty-Fourth Annual Meeting of the Cognitive
> >> Science Society
> >>
> >> -- Michael
> >
> > The experimental results in the first paper (starting on p.64) are
> > fascinating. Required reading for behaviourists! Thanks for the links.
> >
> > --
> > Joe Legris
> >

> I looked over the paper (no, I didn't read it), and my first impression is
> that this is not "must" reading for behaviorists. Or rather, it is far less
> "must" reading than some of the tutorials on Bayesian analyses of coin
> tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do you
> think it is "must" reading for behaviorists? Pitch me. After all, you can
> argue that I can't be persuaded, but you know that you can get a rise out of
> me.
>
>

I was referring to the first paper: A theory of causal learning in
children: Causal maps and Bayes nets. Read pages 64-71 in particular.

--
Joe Legris

0
Reply J 7/18/2006 12:39:51 AM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
>
> > Umm, don't recall wavelets specifically, but certainly when you add
> > 2 #'s you lose some information.
> >
> > If you have a 5, was this the result of adding 1+4, or 2+3? Easy case.
> > Which is it?
> >
> > Or more realistic, 1.1+3.9, or any of an infinite #other possibilities.
>
> Yes, but if you store that lost information somewhere else at the same
> time, then it's not lost.
>


Well, YEAH, but that wasn't the issue, was it. If you otherwise save
the original data somewheres, then you don't lose it, by definition.


>
>  If you compute X=A+B and Y=A-B, then you have
> lost information in both operations.  But yet, you can still use X and Y to
> recompute A, or B, so nothing was in fact lost if you transform A and B,
> into X and Y in this way.  What was lost in each operation was saved, in
> the other.
>


This works because you have "2" output equations. With 2 unknowns
and 2 equations you can work backwards. With 2 unknowns and
only 1 equation, not so. Try ONLY one ... X=A+B

Also, I don't recall exactly, but it's possible the previous discussion

originally came up in the context of looking at the output of a "black
box", and trying to guess the nature of the circuitry contained
within the box. I do seem to recall JAG asking about this. And I
seem to recall saying something to the effect that, you can
theoretically have an infinite #different circuits in the box which
could all produce the same output, and the only way to tell what it
is to open the box and look inside. In brain research, this is called
neurophysiology, rather than behaviorism.


>
>This is true of all linear transforms where the transformation
> matrix is invertible.
>
> There are many known transformations that loose information in their
> individual operations, but which collectively, manage to retain all the
> information. (FFTs, lossless compression, etc).
>

Remember, however, with FFT you transform input real+imaginary
data arrays into output magnitude+phase arrays.

You cannot recover the original data correctly by simply inverse-
transforming just the output magnitude array. You also need the
phase array to get the correct inverse transform.

0
Reply feedbackdroid 7/18/2006 2:22:02 AM

Michael Olea wrote:


I'll make this brief, since I wrote it once already in longer form, and
it vaporized - [one day I'll find the goddamn friggin key that
accidentally erases the entire screen of text]

1. First off, Tai Lee's model is just another partial brain model based
upon a limited and simplified data set. Put it on the pile of 100s of
other models, read as "conjecture". You act like it's the final answer.

2. As noted previously, the Gabor wavelet is likely just the particular
stimulus form that happens to optimally and "fortuitously" excite the
cortex, based upon the particular way the cortex happens to be wired.

To wit, the cortex is wired as a large extended mesh where local cells
are inhibited by surrounding cells, and which in turn are inhibited by
even more distant cells. It's just the same thing repeated. This is
known as surround inhibition, and peripheral dis-inhibition by
multitudes of neuroscientists. It's a basic structure of neural tissue,
not just in cortex, but extending back to the earliest vertebrates, cf
amphibian optic tectum.

AFAIAC, Gabor wavelets just so happen to fortuitously reflect this
underlying structure. I told you this last time, but you didn't get it.
Lee's conjectures aren't gonna change this.

3. It's almost certain that information is removed, and invariant forms
abstracted from the raw images, as the visual cortical hierarchy is
ascended. Hawkins model is based on this idea, of course, as it comes
directly out of decades of neurophysiological recording.

It does no good to keep all of the raw data all the way from input to
highest levels. What is needed is to abstract the information from the
backgroud garbage, so the fact that "... Gabor wavelets will provide
complete representation of any image. ..." is irrelevant. In actual
fact, information is discarded at each and every level of the visual
hierarchy, and in every one of the 30+ visual areas. that's what
"feature detection" is all about, for chrissake.

4. Look at your own comment below ...

>
>  (x,y) -> x+y
>
> discards information. But that is not what is involved in projection onto a
> wavelets basis.
>

The first part agrees with what I said. The 2nd is irrelevant, as
regards "actual" operation of cortex, as I've just indicated above.
Regardless of Lee's model.

Did I miss it somewhere, or did the entire neuroscience community come
out and say that Lee's model is THE FINAL model, and THE FINAL truth?
Or is it just one of the 100s of ideas in the pile?

========================


>
> ====================================================================
> "Also, some people like to use the modern Gabor wavelet, mainly I
> think, because it is more limited in space than a fourier grating.
> To me this doesn't really mean anything, because you've basically
> selected a mathematical shape that matches the cell response, which
> is mainly a result of the anatomy. IOW, Gabor is a computational
> method, but not anything very profound, IMO."
>
> -- Dan Michaels
>
>
> "Mathematically, the 2D Gabor function achieves the resolution
> limit in the conjoint space only in its complex form. Since a
> complex valued 2D Gabor function contains in quadrature projection
> an even-symetric cosine component and an odd-symmetric sine
> component, Pollen and Ronner's finding that simple cells exist
> in quadrature-phase pairs therefore showed that the design of
> the cells might indeed be optimal. The fact that the visual
> cortical cell has evolved to an optimal design for information
> encoding has caused a considerable amount of excitement not
> only in the neuroscience community but in the computer science
> community as well."
>
> -- Tai Sing Lee [2]
>
> "Besides everything I just wrote, I should reiterate that I think
> viewing all these happenings as a "lossless" process is really a
> misnomer. What is really going on are successive transforms of the
> sensory data. If you have something like
>
>  Ce <- Si <- Wi
>
> and the sums of gaussians, how can this possibly be lossless? When
> you add 2 #'s you lose information, namely the original values of those
> 2 #'s"
>
> -- Dan Michaels
>
> "In this paper, we have derived, based on physiological constraints
> and the wavelet theory, a family of 2D Gabor wavelets which model the
> receptive fields of the simple cells in the brain's primary visual
> cortex. By generalizing Daubechies's frame criteria to 2D, we
> established the conditions under which a discrete class of continous
> Gabor wavelets will provide complete representation of any image. ..."
>
> -- Tai Sing Lee [2]
>
> "... Well, I can't resist making one observation: of course the
> transformation
>
>  (x,y) -> x+y
>
> discards information. But that is not what is involved in projection onto a
> wavelets basis. A better analogy would be:
>
>  (x,y) -> (x+y, x-y)
>
> which is lossless. As to why such a transform might be advantageous:
>
> [1] Field. Wavelets, vision and the statistics of natural scenes.
> [2] Lee. Image Representation Using 2D Gabor Wavelets."
> ====================================================================
>
> The above are remarks I made 22 Nov 2005.
>
> All the engineers I know, several, including relatives, friends, and
> colleagues, had to pass elementary linear algebra to get their degrees.
> Of course, with wavelets and Fourier analysis the vectors are functions, and
> the vector spaces are infinite dimensional function spaces, not much
> covered in a first course on linear algebra. They become finite dimensional
> again in the discrete domain of the DFFT or the varioius DWT's.
> 
> -- Michael

0
Reply feedbackdroid 7/18/2006 3:52:14 AM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> Curt Welch wrote:
> > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> >
> > > Umm, don't recall wavelets specifically, but certainly when you add
> > > 2 #'s you lose some information.
> > >
> > > If you have a 5, was this the result of adding 1+4, or 2+3? Easy
> > > case. Which is it?
> > >
> > > Or more realistic, 1.1+3.9, or any of an infinite #other
> > > possibilities.
> >
> > Yes, but if you store that lost information somewhere else at the same
> > time, then it's not lost.
> >
>
> Well, YEAH, but that wasn't the issue, was it.

I believe it was the issue with wavelets that started this.  But I really
have no clue where this discussion came from.

> If you otherwise save
> the original data somewheres, then you don't lose it, by definition.
>
> >
> >  If you compute X=A+B and Y=A-B, then you have
> > lost information in both operations.  But yet, you can still use X and
> > Y to recompute A, or B, so nothing was in fact lost if you transform A
> > and B, into X and Y in this way.  What was lost in each operation was
> > saved, in the other.
> >
>
> This works because you have "2" output equations. With 2 unknowns
> and 2 equations you can work backwards. With 2 unknowns and
> only 1 equation, not so. Try ONLY one ... X=A+B

Though, again, this isn't the issue... But...

You can take two real numbers and combine them into one real number, and
not lose any information.  Simply take the digits of one of the numbers,
and make them the even digits of the result, and the digits of the second
number, and make them the odd digits. Oh, and I guess you have to hide the
sign in there somewhere as well, so just steal an extra digit and encode it
in there.

> Also, I don't recall exactly, but it's possible the previous discussion
> originally came up in the context of looking at the output of a "black
> box", and trying to guess the nature of the circuitry contained
> within the box. I do seem to recall JAG asking about this. And I
> seem to recall saying something to the effect that, you can
> theoretically have an infinite #different circuits in the box which
> could all produce the same output, and the only way to tell what it
> is to open the box and look inside. In brain research, this is called
> neurophysiology, rather than behaviorism.

Yeah.  But many times, the inside is not relevant.  You certainly don't
need to know for AI.

> >This is true of all linear transforms where the transformation
> > matrix is invertible.
> >
> > There are many known transformations that loose information in their
> > individual operations, but which collectively, manage to retain all the
> > information. (FFTs, lossless compression, etc).
> >
>
> Remember, however, with FFT you transform input real+imaginary
> data arrays into output magnitude+phase arrays.
>
> You cannot recover the original data correctly by simply inverse-
> transforming just the output magnitude array. You also need the
> phase array to get the correct inverse transform.

Yeah, I know that.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/18/2006 5:36:00 AM

Michael Olea wrote:
> Glen M. Sizemore wrote:
>
> > I looked over the paper (no, I didn't read it), and my first impression is
> > that this is not "must" reading for behaviorists. Or rather, it is far
> > less "must" reading than some of the tutorials on Bayesian analyses of
> > coin
> > tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do you
> > think it is "must" reading for behaviorists? Pitch me. After all, you can
> > argue that I can't be persuaded, but you know that you can get a rise out
> > of me.
>
> He's probably referring to the differing predictions between RW models,
> various Bayesian models, and the experimental results.
>
> I have mixed reactions to papers with Gopnik's name on them - interesting
> work, but a tendency towards straw-man caricature of alternatives.
>
> -- Michael

I was referring to the distinction between operant conditioning, which
encodes relationships between the organism's behaviour and the
environment, and causal maps, which encode relationships between
aspects of the environment, where the organism may just be an observer.
There seems to be connection to the distinction between learning where,
for example, a rat lacking a hippocampus can quickly return to a
previously discovered object but only if he always starts from the same
position, and learning exhibited by intact animals who can quickly
locate the object from any starting point. The latter are said to
employ a spatial map, and Gopnick claims that causal maps are analogous
functions in a different domain. I wonder if the hippocampus is
involved in both. Maybe causal maps can be seen as a generalization of
spatial maps.

>From p.11:

"Causal maps would also allow animals to extend their causal knowledge
and learning to a  wide variety of new kinds of causal relations, not
just causal relations that involve  rewards or punishments (as in
classical or operant conditioning), not just object  movements and
collisions (as in the Michottean effects), and not just events that
immediately result from their own actions (as in operant conditioning
or trial-and-error  learning). Finally, animals could combine new
information and prior causal information  to create new causal maps,
whether that prior information was hard-wired or previously  learned."

And on p.15:

"Just as causal maps are an interesting halfway point between
domain-specific and  domain-general representations, these causal
learning mechanisms are an interesting  halfway point between
classically nativist and empiricist approaches to learning.
Traditionally, there has been a tension between restricted and
domain-specific learning  mechanisms like "triggering" or
"parameter-setting", and very general learning  mechanisms like
association or conditioning.  In the first kind of mechanism, very
specific kinds of input trigger very highly structured representations.
 In the second kind  of mechanism, any kind of input can be considered,
and the representations simply match  the patterns in the input.  Our
proposal is that causal learning mechanisms transform  domain-general
information about patterns of events, along with other information,
into  constrained and highly structured representations of causal
relations."   

--
Joe Legris

0
Reply J 7/18/2006 11:53:22 AM

Curt Welch wrote:
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > Curt Welch wrote:
> > > "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> > >
> > > > Umm, don't recall wavelets specifically, but certainly when you add
> > > > 2 #'s you lose some information.
> > > >
> > > > If you have a 5, was this the result of adding 1+4, or 2+3? Easy
> > > > case. Which is it?
> > > >
> > > > Or more realistic, 1.1+3.9, or any of an infinite #other
> > > > possibilities.
> > >
> > > Yes, but if you store that lost information somewhere else at the same
> > > time, then it's not lost.
> > >
> >
> > Well, YEAH, but that wasn't the issue, was it.
>
> I believe it was the issue with wavelets that started this.  But I really
> have no clue where this discussion came from.
>
> > If you otherwise save
> > the original data somewheres, then you don't lose it, by definition.
> >
> > >
> > >  If you compute X=A+B and Y=A-B, then you have
> > > lost information in both operations.  But yet, you can still use X and
> > > Y to recompute A, or B, so nothing was in fact lost if you transform A
> > > and B, into X and Y in this way.  What was lost in each operation was
> > > saved, in the other.
> > >
> >
> > This works because you have "2" output equations. With 2 unknowns
> > and 2 equations you can work backwards. With 2 unknowns and
> > only 1 equation, not so. Try ONLY one ... X=A+B
>
> Though, again, this isn't the issue... But...
>


No. That was the original issue regarding loss of information when
adding
2 #'s, but all of the other stuff was piled onto that.


>
> You can take two real numbers and combine them into one real number, and
> not lose any information.  Simply take the digits of one of the numbers,
> and make them the even digits of the result, and the digits of the second
> number, and make them the odd digits. Oh, and I guess you have to hide the
> sign in there somewhere as well, so just steal an extra digit and encode it
> in there.
>
> > Also, I don't recall exactly, but it's possible the previous discussion
> > originally came up in the context of looking at the output of a "black
> > box", and trying to guess the nature of the circuitry contained
> > within the box. I do seem to recall JAG asking about this. And I
> > seem to recall saying something to the effect that, you can
> > theoretically have an infinite #different circuits in the box which
> > could all produce the same output, and the only way to tell what it
> > is to open the box and look inside. In brain research, this is called
> > neurophysiology, rather than behaviorism.
>
> Yeah.  But many times, the inside is not relevant.  You certainly don't
> need to know for AI.
>
> > >This is true of all linear transforms where the transformation
> > > matrix is invertible.
> > >
> > > There are many known transformations that loose information in their
> > > individual operations, but which collectively, manage to retain all the
> > > information. (FFTs, lossless compression, etc).
> > >
> >
> > Remember, however, with FFT you transform input real+imaginary
> > data arrays into output magnitude+phase arrays.
> >
> > You cannot recover the original data correctly by simply inverse-
> > transforming just the output magnitude array. You also need the
> > phase array to get the correct inverse transform.
>
> Yeah, I know that.
>
> --
> Curt Welch                                            http://CurtWelch.Com/
> curt@kcwc.com                                        http://NewsReader.Com/

0
Reply feedbackdroid 7/18/2006 2:43:59 PM

Michael Olea wrote:
> Glen M. Sizemore wrote:
>
> > I looked over the paper (no, I didn't read it), and my first impression 
> > is
> > that this is not "must" reading for behaviorists. Or rather, it is far
> > less "must" reading than some of the tutorials on Bayesian analyses of
> > coin
> > tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do 
> > you
> > think it is "must" reading for behaviorists? Pitch me. After all, you 
> > can
> > argue that I can't be persuaded, but you know that you can get a rise 
> > out
> > of me.
>
> He's probably referring to the differing predictions between RW models,
> various Bayesian models, and the experimental results.
>
> I have mixed reactions to papers with Gopnik's name on them - interesting
> work, but a tendency towards straw-man caricature of alternatives.
>
> -- Michael

JL: I was referring to the distinction between operant conditioning, which
encodes relationships between the organism's behaviour and the
environment, and causal maps, which encode relationships between
aspects of the environment, where the organism may just be an observer.
There seems to be connection to the distinction between learning where,
for example, a rat lacking a hippocampus can quickly return to a
previously discovered object but only if he always starts from the same
position, and learning exhibited by intact animals who can quickly
locate the object from any starting point. The latter are said to
employ a spatial map, and Gopnick claims that causal maps are analogous
functions in a different domain. I wonder if the hippocampus is
involved in both. Maybe causal maps can be seen as a generalization of
spatial maps.



GS: All of this stems from an inability to understand the notion of a 
response class. Say Pigeon A has been trained to peck a key when any one of 
three different pictures of, say, trees, are presented, and Pigeon B has 
been trained on hundreds of such pictures (responding is never reinforced 
when there is anything but the target pictures, and for A the S- stimuli can't 
be trees). The pigeons appear to "have the same response class" when the 
particular picture of a tree is one of the target pictures for A, but the 
difference is quickly revealed when novel pictures are used for each pigeon. 
Here, of course, Pigeon B will likely respond appropriately, but Pigeon A is 
not. Nevertheless, both involve operant response classes, one "big" one for 
Pigeon B, and 3 "small" ones for Pigeon A. Now, how does this apply to 
"spatial maps"? The issue is very difficult to talk about because the 
response classes are hard to name. Let's take an animal that has had a lot 
of exposure to a particular environment (though keep in mind that, unless 
special things have been done since birth, the animal will likely have moved 
about in several different environments). Such animals have moved to a 
variety of places in the environment from a variety of places. Further, 
sometimes they may have moved from one object to another by a, say, L-shaped 
route. But if the object is still in sight, the animal's approach DIRECTLY 
back to the object may be controlled by visual stimuli - yet its return has 
also happened in the context of the L-shaped movement through the space. 
With enough of these sorts of occurrences, it is feasible that the animal 
acquires a set of responses approaching different areas that are partially 
under stimulus control of the preceding movements. Returning to places that 
are obscured visually would establish the preceding movement stimuli as 
powerful discriminative stimuli. Further such animals may have approached a 
particular area from so many different places that "approach to that 
particular area" becomes a generalized operant. Now let's talk about an 
animal that has experience moving about several different environments. Here 
the control by the immediately preceding movements would likely begin to 
control behavior since the visual stimuli would not be relevant to all of 
the environments. Such animals would be able to move about a novel 
environment, and return to certain places without having to retrace their 
steps. The delay between some of the movements and the response raises some 
issues, but are hardly insurmountable from a conceptual standpoint. I am, 
for example, a novice boater, and when I go to a large and unfamiliar lake, 
I frequently glance in the direction of the boat dock after traveling a 
ways, even when the dock is not really visible. The glances are mediating 
responses, and all I have to do is "update" the mediating response 
periodically.



Now take a kid that has retrieved objects from a variety of containers. Some 
need to be twisted, some unlatched some pushed and then turned etc. and the 
person, thus, develops the general response of "going in to things to get 
things" and acquires responses that may overlap different situations. Or say 
the kid has been involved with a variety of circumstance watching falling 
objects, thrown objects, etc. The response classes acquired when retrieving 
such objects would produced generalized response classes that would be 
effective if, say, the initial trajectory was observed but the landing was 
obscured.

>From p.11:

JL: "Causal maps would also allow animals to extend their causal knowledge
and learning to a  wide variety of new kinds of causal relations, not
just causal relations that involve  rewards or punishments (as in
classical or operant conditioning), not just object  movements and
collisions (as in the Michottean effects), and not just events that
immediately result from their own actions (as in operant conditioning
or trial-and-error  learning). Finally, animals could combine new
information and prior causal information  to create new causal maps,
whether that prior information was hard-wired or previously  learned."

And on p.15:

"Just as causal maps are an interesting halfway point between
domain-specific and  domain-general representations, these causal
learning mechanisms are an interesting  halfway point between
classically nativist and empiricist approaches to learning.
Traditionally, there has been a tension between restricted and
domain-specific learning  mechanisms like "triggering" or
"parameter-setting", and very general learning  mechanisms like
association or conditioning.  In the first kind of mechanism, very
specific kinds of input trigger very highly structured representations.
 In the second kind  of mechanism, any kind of input can be considered,
and the representations simply match  the patterns in the input.  Our
proposal is that causal learning mechanisms transform  domain-general
information about patterns of events, along with other information,
into  constrained and highly structured representations of causal
relations."



GS: All of the above nonsense stems from misunderstanding the generalized 
nature of response classes, a willingness to seize on metaphor, and a 
willingness to simply invent processes to "explain" the behavioral phenomena 
from which the processes were inferred in the first place.



"J.A. Legris" <jalegris@sympatico.ca> wrote in message 
news:1153223602.247971.25820@i42g2000cwa.googlegroups.com...
> Michael Olea wrote:
>> Glen M. Sizemore wrote:
>>


0
Reply Glen 7/18/2006 4:46:39 PM

feedbackdroid wrote:

> 
> Curt Welch wrote:

>> > This works because you have "2" output equations. With 2 unknowns
>> > and 2 equations you can work backwards. With 2 unknowns and
>> > only 1 equation, not so. Try ONLY one ... X=A+B
>>
>> Though, again, this isn't the issue... But...
 
> No. That was the original issue regarding loss of information when
> adding
> 2 #'s, but all of the other stuff was piled onto that.

======
"Also, some people like to use the modern Gabor wavelet, mainly I
think, because it is more limited in space than a fourier grating.
To me this doesn't really mean anything, because you've basically
selected a mathematical shape that matches the cell response, which
is mainly a result of the anatomy. IOW, Gabor is a computational
method, but not anything very profound, IMO."

"Besides everything I just wrote, I should reiterate that I think
viewing all these happenings as a "lossless" process is really a
misnomer. What is really going on are successive transforms of the
sensory data. If you have something like

  Ce <- Si <- Wi

and the sums of gaussians, how can this possibly be lossless? When
you add 2 #'s you lose information, namely the original values of those
2 #'s"

-- Dan Michaels
=====

Dan is either lying, stupid, or both.

-- Michael

0
Reply Michael 7/18/2006 5:06:01 PM

Michael Olea wrote:
> feedbackdroid wrote:
>
> >
> > Curt Welch wrote:
>
> >> > This works because you have "2" output equations. With 2 unknowns
> >> > and 2 equations you can work backwards. With 2 unknowns and
> >> > only 1 equation, not so. Try ONLY one ... X=A+B
> >>
> >> Though, again, this isn't the issue... But...
>
> > No. That was the original issue regarding loss of information when
> > adding
> > 2 #'s, but all of the other stuff was piled onto that.
>
> ======
> "Also, some people like to use the modern Gabor wavelet, mainly I
> think, because it is more limited in space than a fourier grating.
> To me this doesn't really mean anything, because you've basically
> selected a mathematical shape that matches the cell response, which
> is mainly a result of the anatomy. IOW, Gabor is a computational
> method, but not anything very profound, IMO."
>
> "Besides everything I just wrote, I should reiterate that I think
> viewing all these happenings as a "lossless" process is really a
> misnomer. What is really going on are successive transforms of the
> sensory data. If you have something like
>
>   Ce <- Si <- Wi
>
> and the sums of gaussians, how can this possibly be lossless? When
> you add 2 #'s you lose information, namely the original values of those
> 2 #'s"
>
> -- Dan Michaels
> =====
>
> Dan is either lying, stupid, or both.
>
> -- Michael


Congrats. I see you're starting to emulate your new best friend, GS.

I just explained this entire thing to you - ONCE AGAIN - in my post
from yesterday. You still don't get it. But I'll do it a 3rd time.

1. The cortex has a specific connection architecture, involving
surround
inhibition coming to cells from local peripheral regions, and
dis-inhibition
from further-out peripheral regions. All the Gabor wavelet is is a
BETTER stimulus than the more spatially-extensive sinusoidal gratings
that were in popular use 20 or so years ago. No big surprise. When you
find a better fitting stimulus then you get a greater response from the
cells.

2. 50 years worth of physiological recordings have shown there is an
increasing amount of abstraction computed on the visual image as one
ascends the visual hierarchy. This means that loads of information is
thrown away in order to abstract out salient "features" present in the
image. Look at IT, the so-called face area. Those cells respond to
extreme abstractions in the visual images. The raw pixel data is
discarded
so the cells can signal "yes, it's a face". Turn on a pixel here or
there in
the incoming image, and the cell response isn't affected.

That's loss of information, redundancy reduction, invariance
abstraction,
feature detection, to name a few applicable terms. This has all been
known for 50+ years.

This is what I told you in november, and this is what I am re-iterating

today. Stop mixing your [ie Lee's] simplified mathematical models with
what actually happens in the real-world.

0
Reply feedbackdroid 7/18/2006 5:35:23 PM

feedbackdroid wrote:

> 
> Michael Olea wrote:
>> feedbackdroid wrote:
>>
>> >
>> > Curt Welch wrote:
>>
>> >> > This works because you have "2" output equations. With 2 unknowns
>> >> > and 2 equations you can work backwards. With 2 unknowns and
>> >> > only 1 equation, not so. Try ONLY one ... X=A+B
>> >>
>> >> Though, again, this isn't the issue... But...
>>
>> > No. That was the original issue regarding loss of information when
>> > adding
>> > 2 #'s, but all of the other stuff was piled onto that.
>>
>> ======
>> "Also, some people like to use the modern Gabor wavelet, mainly I
>> think, because it is more limited in space than a fourier grating.
>> To me this doesn't really mean anything, because you've basically
>> selected a mathematical shape that matches the cell response, which
>> is mainly a result of the anatomy. IOW, Gabor is a computational
>> method, but not anything very profound, IMO."
>>
>> "Besides everything I just wrote, I should reiterate that I think
>> viewing all these happenings as a "lossless" process is really a
>> misnomer. What is really going on are successive transforms of the
>> sensory data. If you have something like
>>
>>   Ce <- Si <- Wi
>>
>> and the sums of gaussians, how can this possibly be lossless? When
>> you add 2 #'s you lose information, namely the original values of those
>> 2 #'s"
>>
>> -- Dan Michaels
>> =====
>>
>> Dan is either lying, stupid, or both.
>>
>> -- Michael
 
> Congrats. I see you're starting to emulate your new best friend, GS.
 
> I just explained this entire thing to you - ONCE AGAIN - in my post
> from yesterday. You still don't get it. But I'll do it a 3rd time.

Yes, I'll get around to your remarks, the same irrelevant uncomprehending
remarks you made in november, remarks having nothing to do with the claims
you seem to imagine you are addressing, but the point here is simple and
something entirely else:

>> > No. That was the original issue regarding loss of information when
>> > adding
>> > 2 #'s, but all of the other stuff was piled onto that.

>> ...If you have something like
>>
>>   Ce <- Si <- Wi
>>
>> and the sums of gaussians, how can this possibly be lossless? When
>> you add 2 #'s you lose information, namely the original values of those
>> 2 #'s"

This was in response to comments about the responses of so called simple
cells in V1 acting as Gabor wavelet basis functions. This is the context in
which you made your remark about adding two numbers losing information,
this is the point that Curt explained to you as did I in november, that
such transformations can be lossless because "adding two numbers" is not
the only thing they do, this is the point that makes your claim "all of the
other stuff was piled onto that" either lying or stupidity or both.

-- Michael

0
Reply Michael 7/18/2006 5:56:21 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote in message 
news:1153244123.481581.65410@h48g2000cwc.googlegroups.com...
>
> Michael Olea wrote:
>> feedbackdroid wrote:
>>
>> >
>> > Curt Welch wrote:
>>
>> >> > This works because you have "2" output equations. With 2 unknowns
>> >> > and 2 equations you can work backwards. With 2 unknowns and
>> >> > only 1 equation, not so. Try ONLY one ... X=A+B
>> >>
>> >> Though, again, this isn't the issue... But...
>>
>> > No. That was the original issue regarding loss of information when
>> > adding
>> > 2 #'s, but all of the other stuff was piled onto that.
>>
>> ======
>> "Also, some people like to use the modern Gabor wavelet, mainly I
>> think, because it is more limited in space than a fourier grating.
>> To me this doesn't really mean anything, because you've basically
>> selected a mathematical shape that matches the cell response, which
>> is mainly a result of the anatomy. IOW, Gabor is a computational
>> method, but not anything very profound, IMO."
>>
>> "Besides everything I just wrote, I should reiterate that I think
>> viewing all these happenings as a "lossless" process is really a
>> misnomer. What is really going on are successive transforms of the
>> sensory data. If you have something like
>>
>>   Ce <- Si <- Wi
>>
>> and the sums of gaussians, how can this possibly be lossless? When
>> you add 2 #'s you lose information, namely the original values of those
>> 2 #'s"
>>
>> -- Dan Michaels
>> =====
>>
>> Dan is either lying, stupid, or both.
>>
>> -- Michael
>
>
> Congrats. I see you're starting to emulate your new best friend, GS.

I don't think so. When I thought I had a chance to be best friends with 
Michael, I sent him the following email:



Dear Michael,

            I like you. Do you like me? I would like to be your best friend. 
Do you want to be best friends? Mark one:







Yes________                                               NO_________







He didn't return my email. A reasonable conclusion is, therefore, that he 
came to the conclusion that you are intellectually dishonest, and stupid, on 
his own. I, however, think you are an order of magnitude more dishonest than 
you are stupid. But, then, I read a lot of people's posts here and, thus, I 
tend to think of "stupid" as being sort of calibrated on Verhey. You are not 
as stupid as Verhey.



You're welcome,

Glen



>
> I just explained this entire thing to you - ONCE AGAIN - in my post
> from yesterday. You still don't get it. But I'll do it a 3rd time.
>
> 1. The cortex has a specific connection architecture, involving
> surround
> inhibition coming to cells from local peripheral regions, and
> dis-inhibition
> from further-out peripheral regions. All the Gabor wavelet is is a
> BETTER stimulus than the more spatially-extensive sinusoidal gratings
> that were in popular use 20 or so years ago. No big surprise. When you
> find a better fitting stimulus then you get a greater response from the
> cells.
>
> 2. 50 years worth of physiological recordings have shown there is an
> increasing amount of abstraction computed on the visual image as one
> ascends the visual hierarchy. This means that loads of information is
> thrown away in order to abstract out salient "features" present in the
> image. Look at IT, the so-called face area. Those cells respond to
> extreme abstractions in the visual images. The raw pixel data is
> discarded
> so the cells can signal "yes, it's a face". Turn on a pixel here or
> there in
> the incoming image, and the cell response isn't affected.
>
> That's loss of information, redundancy reduction, invariance
> abstraction,
> feature detection, to name a few applicable terms. This has all been
> known for 50+ years.
>
> This is what I told you in november, and this is what I am re-iterating
>
> today. Stop mixing your [ie Lee's] simplified mathematical models with
> what actually happens in the real-world.
> 


0
Reply Glen 7/18/2006 8:14:53 PM

Let me tell you another story; only this time it isn't allegory. Dan, on at 
least 4 occasions, urged list readers to contact my employer and report my 
"abuse" (i.e. swearing at Dan, calling him a stupid fuck and so forth). 
Indeed, Dan posted the name of the private college, at which I was then 
employed, saying something like "I wonder what his boss [and he gave my boss' 
name] would think about statements like: [swearing at Dan, commenting on his 
intellectual dishonesty, etc. etc.]. Eventually, in fact, someone took it up 
with my current "boss" (I think the connection was directly through Dan, if 
not Dan himself). A paraphrase of the interchange went like this:



Chair: I've heard that you are being verbally abusive in a medium that 
interfaces with the public. For example, [something like my famous, "you do 
philosophy as well as my ass chews gum"]- - -. Of course, if your identity 
is being stolen - - -



Glen: No, my identity is not being stolen; it's me. But I never mentioned my 
affiliation, and the reason is that I wish to maintain my first-amendment 
rights, all while not even implying remotely that what I say is the opinion 
of the university.



Chair: You never mentioned the University?



Glen: By design. For the very reasons you're talking about.



Chair: So they went the extra mile and tracked your affiliation through the 
internet?



Glen: Yes.



Chair: It seems to me that you have balanced your first amendment rights and 
the rights of the university.



Glen: Ok -----------.



Chair: Glen, I would urge you not to do what I have seen here. Really - I've 
been there myself.

----------------------------------------------------



And so on.



I have said some mean things. I have used, ummm, harsh language. Father 
forgive me! But I never, ever, argued that someone should be censored by 
some institution. The irony, as I have mentioned before, is that Dan liked 
to play the "Skinnerians are fascists" card. Go figure, huh?





"Glen M. Sizemore" <gmsizemore2@yahoo.com> wrote in message 
news:44bd40e8$0$2513$ed362ca5@nr1.newsreader.com...
>
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote in message 
> news:1153244123.481581.65410@h48g2000cwc.googlegroups.com...
>>
>> Michael Olea wrote:
>>> feedbackdroid wrote:
>>>
>>> >
>>> > Curt Welch wrote:
>>>
>>> >> > This works because you have "2" output equations. With 2 unknowns
>>> >> > and 2 equations you can work backwards. With 2 unknowns and
>>> >> > only 1 equation, not so. Try ONLY one ... X=A+B
>>> >>
>>> >> Though, again, this isn't the issue... But...
>>>
>>> > No. That was the original issue regarding loss of information when
>>> > adding
>>> > 2 #'s, but all of the other stuff was piled onto that.
>>>
>>> ======
>>> "Also, some people like to use the modern Gabor wavelet, mainly I
>>> think, because it is more limited in space than a fourier grating.
>>> To me this doesn't really mean anything, because you've basically
>>> selected a mathematical shape that matches the cell response, which
>>> is mainly a result of the anatomy. IOW, Gabor is a computational
>>> method, but not anything very profound, IMO."
>>>
>>> "Besides everything I just wrote, I should reiterate that I think
>>> viewing all these happenings as a "lossless" process is really a
>>> misnomer. What is really going on are successive transforms of the
>>> sensory data. If you have something like
>>>
>>>   Ce <- Si <- Wi
>>>
>>> and the sums of gaussians, how can this possibly be lossless? When
>>> you add 2 #'s you lose information, namely the original values of those
>>> 2 #'s"
>>>
>>> -- Dan Michaels
>>> =====
>>>
>>> Dan is either lying, stupid, or both.
>>>
>>> -- Michael
>>
>>
>> Congrats. I see you're starting to emulate your new best friend, GS.
>
> I don't think so. When I thought I had a chance to be best friends with 
> Michael, I sent him the following email:
>
>
>
> Dear Michael,
>
>            I like you. Do you like me? I would like to be your best 
> friend. Do you want to be best friends? Mark one:
>
>
>
>
>
>
>
> Yes________                                               NO_________
>
>
>
>
>
>
>
> He didn't return my email. A reasonable conclusion is, therefore, that he 
> came to the conclusion that you are intellectually dishonest, and stupid, 
> on his own. I, however, think you are an order of magnitude more dishonest 
> than you are stupid. But, then, I read a lot of people's posts here and, 
> thus, I tend to think of "stupid" as being sort of calibrated on Verhey. 
> You are not as stupid as Verhey.
>
>
>
> You're welcome,
>
> Glen
>
>
>
>>
>> I just explained this entire thing to you - ONCE AGAIN - in my post
>> from yesterday. You still don't get it. But I'll do it a 3rd time.
>>
>> 1. The cortex has a specific connection architecture, involving
>> surround
>> inhibition coming to cells from local peripheral regions, and
>> dis-inhibition
>> from further-out peripheral regions. All the Gabor wavelet is is a
>> BETTER stimulus than the more spatially-extensive sinusoidal gratings
>> that were in popular use 20 or so years ago. No big surprise. When you
>> find a better fitting stimulus then you get a greater response from the
>> cells.
>>
>> 2. 50 years worth of physiological recordings have shown there is an
>> increasing amount of abstraction computed on the visual image as one
>> ascends the visual hierarchy. This means that loads of information is
>> thrown away in order to abstract out salient "features" present in the
>> image. Look at IT, the so-called face area. Those cells respond to
>> extreme abstractions in the visual images. The raw pixel data is
>> discarded
>> so the cells can signal "yes, it's a face". Turn on a pixel here or
>> there in
>> the incoming image, and the cell response isn't affected.
>>
>> That's loss of information, redundancy reduction, invariance
>> abstraction,
>> feature detection, to name a few applicable terms. This has all been
>> known for 50+ years.
>>
>> This is what I told you in november, and this is what I am re-iterating
>>
>> today. Stop mixing your [ie Lee's] simplified mathematical models with
>> what actually happens in the real-world.
>>
>
> 


0
Reply Glen 7/18/2006 9:30:59 PM

"feedbackdroid" <feedbackdroid@yahoo.com> wrote:

> 2. 50 years worth of physiological recordings have shown there is an
> increasing amount of abstraction computed on the visual image as one
> ascends the visual hierarchy. This means that loads of information is
> thrown away in order to abstract out salient "features" present in the
> image. Look at IT, the so-called face area. Those cells respond to
> extreme abstractions in the visual images. The raw pixel data is
> discarded so the cells can signal "yes, it's a face". Turn on a pixel
> here or there in the incoming image, and the cell response isn't
> affected.
>
> That's loss of information,

Once again.  It's not a loss of information if the information is stored
elsewhere.  If you transform pixel data into _multiple_ high level
abstractions, there is no need for any of the information to be lost.  So,
the simple fact that an operation like a+b is happening, is no proof that
information is being discarded by the network.  You would have to prove
that a corresponding feature like a-b was not being abstracted at the same
time.  And we know for a fact that the visual system transforms the lower
level data into multiple higher level abstractions at each step.

You would have to prove that when you changed a pixel, that none of the
high level abstractions changed as a result of the pixel change for
example.

In addition, I've heard the visual system creates a large fan-out in the
signals on the order of magnitude of 400 to 1.  This makes it even easier
to believe the system is not throwing data way as it always seems to be
creating more high level abstractions at each new level.  So if you turn
(a,b) into (a+b, 2a-b, 2b-a) (a 1.5 to 1 fan out) you have actually created
redundancy in the data.  Any one of the three high level abstractions can
be thrown out and you still can recreate the (a,b) input.

As you said in your other reply to me - if you have more equations than
unknowns, you can always solve for the unknowns (assuming the equations are
not effective duplicates).  Each high level abstraction (like a face
detector) represents an equation and if you have a 400 to 1 fan-out, you
will end up with 400 equations for each unknown.  The brain would actually
have to work hard at producing redundant high level abstractions in order
to throw information away in this situation.

Now, on the other hand, I don't claim the visual system is not throwing
information away - I'm just claiming that what you have mentioned is not a
clear indication that it is - which seems to be what you are trying to
argue above.

Has there been a information theoretical analysis of the total visual
system that clearly shows information is being discarded at each higher
level of abstraction? I would guess there hasn't been simply because we
don't have enough tools to correctly record and map out the function of an
entire visual system to the resolution needed to answer that question.

> redundancy reduction, invariance abstraction,
> feature detection, to name a few applicable terms. This has all been
> known for 50+ years.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/18/2006 11:16:57 PM

Glen M. Sizemore wrote:
> Michael Olea wrote:
> > Glen M. Sizemore wrote:
> >
> > > I looked over the paper (no, I didn't read it), and my first impression
> > > is
> > > that this is not "must" reading for behaviorists. Or rather, it is far
> > > less "must" reading than some of the tutorials on Bayesian analyses of
> > > coin
> > > tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do
> > > you
> > > think it is "must" reading for behaviorists? Pitch me. After all, you
> > > can
> > > argue that I can't be persuaded, but you know that you can get a rise
> > > out
> > > of me.
> >
> > He's probably referring to the differing predictions between RW models,
> > various Bayesian models, and the experimental results.
> >
> > I have mixed reactions to papers with Gopnik's name on them - interesting
> > work, but a tendency towards straw-man caricature of alternatives.
> >
> > -- Michael
>
> JL: I was referring to the distinction between operant conditioning, which
> encodes relationships between the organism's behaviour and the
> environment, and causal maps, which encode relationships between
> aspects of the environment, where the organism may just be an observer.
> There seems to be connection to the distinction between learning where,
> for example, a rat lacking a hippocampus can quickly return to a
> previously discovered object but only if he always starts from the same
> position, and learning exhibited by intact animals who can quickly
> locate the object from any starting point. The latter are said to
> employ a spatial map, and Gopnick claims that causal maps are analogous
> functions in a different domain. I wonder if the hippocampus is
> involved in both. Maybe causal maps can be seen as a generalization of
> spatial maps.
>
>
>
> GS: All of this stems from an inability to understand the notion of a
> response class. Say Pigeon A has been trained to peck a key when any one of
> three different pictures of, say, trees, are presented, and Pigeon B has
> been trained on hundreds of such pictures (responding is never reinforced
> when there is anything but the target pictures, and for A the S- stimuli can't
> be trees). The pigeons appear to "have the same response class" when the
> particular picture of a tree is one of the target pictures for A, but the
> difference is quickly revealed when novel pictures are used for each pigeon.
> Here, of course, Pigeon B will likely respond appropriately, but Pigeon A is
> not. Nevertheless, both involve operant response classes, one "big" one for
> Pigeon B, and 3 "small" ones for Pigeon A. Now, how does this apply to
> "spatial maps"? The issue is very difficult to talk about because the
> response classes are hard to name. Let's take an animal that has had a lot
> of exposure to a particular environment (though keep in mind that, unless
> special things have been done since birth, the animal will likely have moved
> about in several different environments). Such animals have moved to a
> variety of places in the environment from a variety of places. Further,
> sometimes they may have moved from one object to another by a, say, L-shaped
> route. But if the object is still in sight, the animal's approach DIRECTLY
> back to the object may be controlled by visual stimuli - yet its return has
> also happened in the context of the L-shaped movement through the space.
> With enough of these sorts of occurrences, it is feasible that the animal
> acquires a set of responses approaching different areas that are partially
> under stimulus control of the preceding movements. Returning to places that
> are obscured visually would establish the preceding movement stimuli as
> powerful discriminative stimuli. Further such animals may have approached a
> particular area from so many different places that "approach to that
> particular area" becomes a generalized operant. Now let's talk about an
> animal that has experience moving about several different environments. Here
> the control by the immediately preceding movements would likely begin to
> control behavior since the visual stimuli would not be relevant to all of
> the environments. Such animals would be able to move about a novel
> environment, and return to certain places without having to retrace their
> steps. The delay between some of the movements and the response raises some
> issues, but are hardly insurmountable from a conceptual standpoint. I am,
> for example, a novice boater, and when I go to a large and unfamiliar lake,
> I frequently glance in the direction of the boat dock after traveling a
> ways, even when the dock is not really visible. The glances are mediating
> responses, and all I have to do is "update" the mediating response
> periodically.
>
>
>
> Now take a kid that has retrieved objects from a variety of containers. Some
> need to be twisted, some unlatched some pushed and then turned etc. and the
> person, thus, develops the general response of "going in to things to get
> things" and acquires responses that may overlap different situations. Or say
> the kid has been involved with a variety of circumstance watching falling
> objects, thrown objects, etc. The response classes acquired when retrieving
> such objects would produced generalized response classes that would be
> effective if, say, the initial trajectory was observed but the landing was
> obscured.
>
> >From p.11:
>
> JL: "Causal maps would also allow animals to extend their causal knowledge
> and learning to a  wide variety of new kinds of causal relations, not
> just causal relations that involve  rewards or punishments (as in
> classical or operant conditioning), not just object  movements and
> collisions (as in the Michottean effects), and not just events that
> immediately result from their own actions (as in operant conditioning
> or trial-and-error  learning). Finally, animals could combine new
> information and prior causal information  to create new causal maps,
> whether that prior information was hard-wired or previously  learned."
>
> And on p.15:
>
> "Just as causal maps are an interesting halfway point between
> domain-specific and  domain-general representations, these causal
> learning mechanisms are an interesting  halfway point between
> classically nativist and empiricist approaches to learning.
> Traditionally, there has been a tension between restricted and
> domain-specific learning  mechanisms like "triggering" or
> "parameter-setting", and very general learning  mechanisms like
> association or conditioning.  In the first kind of mechanism, very
> specific kinds of input trigger very highly structured representations.
>  In the second kind  of mechanism, any kind of input can be considered,
> and the representations simply match  the patterns in the input.  Our
> proposal is that causal learning mechanisms transform  domain-general
> information about patterns of events, along with other information,
> into  constrained and highly structured representations of causal
> relations."
>
>
>
> GS: All of the above nonsense stems from misunderstanding the generalized
> nature of response classes, a willingness to seize on metaphor, and a
> willingness to simply invent processes to "explain" the behavioral phenomena
> from which the processes were inferred in the first place.
> 
> 

What do you think of the experiments (p.64)?

--
Joe Legris

0
Reply J 7/19/2006 2:41:07 PM

Jim Bromer wrote:
> J.A. Legris wrote:
> > OK, let's get started. What is gradual learning, and under what
> > circumstances does it arise?
> >
> > --
> > Joe Legris
>
> The classical example of logical reasoning is,
> All men are mortal.
> Socrates is a man.
> Therefore, we know -by form- that Socrates is mortal.
>
> This concept of form was also used in the development of algebra where
> we know facts like,
> 2a + 2a = 4a
> if a is any real number.  So, for example, we know -by form- that if
> a=3 then 2*3+2*3=4*3.
>
> One of the GOFAI models used categories and logic in order to create
> logical conclusions for new information based on previously stored
> information.  In a few cases this model produced good results even for
> some novel examples.  But, it also produced a lot of incorrect results
> as well.  I wondered why this GOFAI model did not work better more
> often.  One of the reasons I discovered is that we learn gradually, so
> that by the time we are capable of realizing that the philosopher is
> mortal just because he is a man and all men are mortal, we also know a
> huge amount of other information that is relevant to this problem.  The
> child learns about mortality in dozens of ways if not hundreds or even
> thousands of ways before he is capable of realizing that since all men
> are mortal, then Socrates must also be mortal.
>
> I realized that this kind of logical reasoning can be likened to
> instant learning.  If you learn that Ed is a man, then you also
> instantly know that Ed must be mortal as well.  This is indeed a valid
> process, and I feel that it is an important aspect of intelligence.
> But before we get to the stage where we can derive an insight through
> previously learned information and have some capability to judge the
> value of that derived insight, we have to learn a great many related
> pieces of knowledge.  So my argument here, is that while instant
> derivations are an important part of Artificial Intelligence, we also
> need to be able to use more gradual learning methods to produce the
> prerequisite background information so that derived insights can be
> used more effectively.
>
> Gradual learning is an important part of this process.  We first learn
> about things in piecemeal fashion before we can put more complicated
> ideas together.  I would say that reinforcement learning is a form of
> gradual learning but there are great many other methods of gradual
> learning available to the computer programmer.
>
> It's hard for most people to understand me (or for that matter even
> to believe me) when I try to describe how adaptive AI learning might
> take place without predefined variable-data references.  So it is much
> easier for me to use some kind of data variable-explicit model to try
> to talk about my ideas.
>
> Imagine a complicated production process that had all kinds of sensors
> and alarms.  You might imagine a refinery or something like that.
> However, since I don't know too much about material processes, I
> wouldn't try to simulate something like that but I would instead
> create a computer model that used algorithms to produce streams of data
> to represent the data produced by an array of sensors.  Under a number
> of different situations, alarms would go off when certain combinations
> of sensor threshold values were hit.  This computer generated model
> would be put through thousands of different runs using different
> initial input parameters so that it would produce a wide range of data
> streams through the virtual sensors.  It would then be the job of the
> AI module to try to predict which alarms would be triggered and when
> they would be triggered before the event occurred.  The algorithms that
> produced the alarms could be varied and complicated.  For example, if
> sensor line 3 and sensor line 4 go beyond some threshold values for at
> least 5 units of time, then alarm 23 would be triggered unless line 6
> dipped below some threshold value at least two times in the 10 units of
> time before.  There might be hundreds of such alarm scenarios.
> Individual sensor lines might be involved in a number of different
> alarm scenarios.  An alarm might, for another example, be triggered if
> the average value of all the sensor inputs was within some specified
> range.  The specified triggers for some alarms might change from run to
> run, or even during a run.  Some of these scenarios would be simple,
> and some might be very complex.  Some scenarios might even be triggered
> by non-sensed events.  The range of possibilities, even within this
> very constrained data-event model is tremendous if not truly infinite.
>
> The AI module might be exposed to a number of runs that produced very
> similar sensor values, or it might be exposed to very few runs that
> produced similar data streams.
>
> Superficially this might look a little like a reinforcement scenario
> since the alarms could be seen as negative reinforcements, but it
> clearly is not a proper model for behaviorist conditioning.  The only
> innate 'behavior' is that the AI module is programmed to produce is to
> try to develop conjectures to predict the data events that could
> trigger the various alarms.
>
> I argue that since simplistic assessments of the runs would not work
> for every kind of alarm scenario, the program should start out with
> gradual learning in order to reduce the false positives where it
> predicted an alarm event that did not subsequently occur.
>
> This model might have hundreds or thousands of sensors. It might have
> hundreds of alarms.  It might have a variety of combinations of data
> events that could cause or inhibit an alarm.  Non-sensible data events
> might interact with the sensory data events to trigger or inhibit an
> alarm.  Furthermore, the AI module might be able to mitigate or operate
> the data events that drive the sensors so that it could run interactive
> experiments to test its conjectures.
>
> I have described a complex model where an imagined AI module would have
> to make conjectures about the data events that triggered an alarm.  Off
> hand I cannot think of any one learning method that would be best for
> this problem.  So lacking that wisdom I would suggest that the program
> might run hundreds or even thousands of different learning methods in
> an effort to discover predictive conjectures that would have a high
> correlation with actual alarms.  This is a complex model problem which
> does not lend itself to a single simplistic AI paradigm.  I contend
> that the use of hundreds or maybe even thousands of learning mechanisms
> is going to be a necessary component of innovative AI paradigms in near
> future.  And it seems reasonable to assume that initial learning is
> typically going to be a gradual process in such complex scenarios.
>
> I will try to finish this in the next few days so that I can describe
> some of the different methods to produce conjectures that might be made
> in this setting and to try to show how some of these methods could be
> seen as making instant conjectures while others could be seen as
> examples of gradual learning.
>
> Jim Bromer
In my previous message I described a computer model that produced a
stream of data which, under a variety of different conditions could set
off alarms.  An AI program or subprogram would have the task to try to
make predictions when and why the alarms would be set off.  There could
be two test modes for the AI program.  In one, it would only be able to
make observations of the streams of input data, and in the second test
mode it would be able to interact with data environment to some extent
by setting the values of some of the streams of input data in order to
test its conjectures.  The AI module would have access to that input
data streams in order to try to make its predictions, but it would not
have access to the algorithms that produced the streams of data, and it
would not have access to the algorithms that established the causal
relations between the data streams or other undetectable events and the
alarms.

Suppose, for example, there were 1000 streams of simulated sensor
readings and 100 alarms.  The streams of sensor readings range from a
value of 0 to 100 at each sampling.  Each run of data involves some
number of sampling time units.  Also suppose that some of the alarms
were set by the conditions like the following.
Alarm 4: Goes off whenever Sensor 5 is between the value of 30 and 40.
Alarm 6: Goes off whenever Sensor 8 is between 2 and 6, or between 20
and 25, or between 32 and 35, or between 43 and 47, or between 57 and
64.
Alarm 12: Goes off whenever the average value of the 1000 Sensor
Streams of data at any point in time is between 50 and 60.
Alarm 23: Goes off if Sensor 3 and Sensor 4 go beyond the threshold
value of 15 for at least 5 units of time, unless line 6 was below the
threshold value of 40 for at least two sampling times in the 10 units
of time before.
Alarm 33: Goes off when Sensor 3 and Sensor 4 go beyond the threshold
value of 15 for at least 5 units of time, or when Sensor 23 goes is
between the value of 20 and 90, or when Sensor 45 is above the
threshold of 80, or when Sensor 80 is below the threshold of 30.  Alarm
33 would therefore go off whenever Alarm 23 would go off, but it would
also be set under other conditions as well.
Alarm 55: An undetected event sets Alarm 55 off.
Alarm 56: An undetected event sets Alarm 56 off if Sensor 32 is above
the threshold value or 80.

Suppose the AI module conducted a number of different analyses, and
that one of its analyses was made by examining each of the individual
Sensor data streams to see if any of their values correlated strongly
with an Alarm when it went off.  Some Bayesian enthusiasts might think
that Alarm 4, which goes off whenever Sensor 5 is between 30 and 40,
should be catchable by a Bayesian analysis of the individual data
streams.  That may be true, but it is not that simple.  Remember that
the AI module would not have any way of casually distinguishing between
coincidences and valid causal relationships and that the kinds of
events that could set an alarm off is varied.  So even after a few
incidences where Alarm 4 went off, while the collected data would show
that Sensor 5 was between 30 and 40 each time, the data would also show
a range of specific values for each of the 999 other data lines.  True,
a Bayesian method that was programmed to test the relation between
Sensor 5 would detect the relation between Sensor 5 and Alarm 4, but
because there are so many different conditions that could trigger an
alarm, the program would have to still consider other conditions as
well.  For example, while Alarm 6 is only triggered by Sensor Line 8 it
has a much more varied set of ranges which can act as triggers.  This
means that in order for the Bayesian method to quickly ascertain the
relation between Sensor 5 and Alarm 4, it would have to be explicitly
looking for a single range as a trigger.  For the Bayesian method to
quickly ascertain that Alarm 6 is correlated with Sensor 8 on the other
hand, it would have to programmed with the assumption that there could
be quite a few ranges for a single Sensor to trigger an alarm.  This
reasoning suggests that the bland or vanilla proposal that a single
(simplistic) analytical technique could solve all AI problems is not
based on an insightful analysis of the varied kinds of problems the
program could be exposed to.  Many analytical and learning methods work
well when the problem is kept simple enough, but when the problem is
not simple even a relatively sophisticated method like a Bayesian
method or other statistical methods just are not up to the task unless
they are designed to test for a wide range of possibilities.  I believe
that the only way to get around this is to use a variety of analytical
methods in the initial development and testing of conjectures.

Once a statistical method began to suspect that a strong relation
between Alarm 4 and Sensor 5 existed, it could look at the negative
correlations well.  Here again, negative correlations might not work as
well as might be presumed.  Look at the causes of Alarm 33.  Alarm 33
might be set off by Sensor 23 when it is between 20 and 90, but other
Sensor Lines can also trigger Alarm 33. So a correlation between Sensor
23 when it is less than 20 or greater than 90 and Alarm
33-has-not-been-triggered is not that great.  (In other words, Alarm 33
might be set even when Sensor 23 is less than 20 or greater than 90).
So while Alarm 33is-triggered is positively correlated with Sensor 23
when it is between 20 and 90, the correlation between the negative
cases is not as strong.  And, significantly other Sensors which do not
vary much might have similar signatures.  While they have strong
correlations of being within the range when the Alarm is triggered,
they would also be in the same range even if the Alarm is not
triggered.  So the negative cases and the background cases have to be
considered in a general analyses as well.

Alarm 12 goes off whenever the average value of the 1000 Sensor Streams
of data at any point in time is between 50 and 60 so this suggests that
an analytical technique that effectively examines the correlation
between the average of the Sensors and the Alarms would have to be used
to find this relation.

Alarm 33 is partially dependent on Alarm 23, so this shows that higher
information processing, a little like using higher symbols to deduce a
logical relationship could be used even to detect the simplest of
relations.

If the AI module had the ability to affect the Sensor input values, it
could do a better job of finding correlations between individual
Sensors and the Alarms.  It could for example test Sensor 5 at various
values while holding the other Sensors constant and through this
methodical testing discover the relation between Sensor 5 and Alarm 4.
But without the ability to affect the Sensor input values or in cases
where it only had limited abilities to test the system by setting the
Sensor inputs this way, the complexity of distinguishing between false
and valid triggers would often be quite difficult.  We can imagine a
Bayesian method that might look for the correlations between Alarms and
simpler ranges of individual Sensors first, and then after it rules
them out, it would look for more complicated multiple ranges and more
complicated combinations of Sensor values.  But the idea here, is that
a very serious complication has already appeared before we have even
left the starting gate so to speak, and that means that more elaborate
methods have to be defined for the problem. These different analytical
methods will also require more time to test the more likely
possibilities and then to rule out the less likely possibilities and so
I contend that they constitute an example where gradual learning is
needed.  Other kinds of testing and conjecture could then be used in an
effort to see how accurate and how far ahead of an Alarm it could make
its predictions.

This example wasn't as good as I had originally thought it would be,
but the majority of people sophisticated enough to program a computer
should be able to, at the least, get where I was going with it.
Because a single overly-simplistic analytical technique will not
suffice to detect all cases, a number of different test assumptions
need to be tried.  Some of these test models will produce interesting
results that can then be followed up.  However, because the typical
case under such complex conditions is that of partial correlation,
there are many cases of false conjectures which would also produce
partial correlations.  This situation is similar to the situation in
the sciences where alternative theories both seem to explain the data
of an experiment, but where neither surpasses some minimal threshold of
confidence to be seen by the majority of experts as being convincing.
Further testing is needed.  The follow up testing however will require
some sophistication whereby the results of the preliminary tests could
be interpreted, cross-referenced, integrated and used to intelligently
generate a series of additional tests.  At that point conjectures
derived from the first tests could be used as the assumptions of the
next wave of testing.

I do not see a simplistic or an elegant method to create a viable
artificial intelligence.  Instead, I see the need for a number of
different analytical techniques that will often provide imperfect
information.  This imperfect information needs to be used carefully.
Newly acquired insights may be leveraged by being used with previous
knowledge, but as the history of AI has clearly shown, the products
derived though this kind of information leveraging have to be evaluated
in the terms of a background of relevant information in order to
increase their chances of being useful.

Computers are really good at learning.  They can instantly remember
anything as long as they have enough memory to store the information.
So the real problem for AI is not how to get the computer to learn, but
how to get it to figure things out to be able to integrate information
and to use it intelligently.  In a sense then, the conventional storing
of data is both instant and incremental, but it is not in itself an
insightful process.  In order to get the computer to integrate
information intelligently it has to be able to figure out how the data
fits together.  This process has to typically be gradual because the
number of different possibilities is so great.

But the important thing here is that there are good reasons for an AI
program to integrate information using a gradual process and this
understanding may be used to help shape more sophisticated learning
strategies.  Learning has to consist of more than simply storing
information into the computer, it has to also include the skills
required for the computer to integrate the information intelligently.
The fact that these processes have to overcome such overwhelming odds
against them can be seen as a theory that explains why learning is
often gradual.  

Jim Bromer

0
Reply Jim 7/19/2006 2:50:30 PM

"J.A. Legris" <jalegris@sympatico.ca> wrote in message 
news:1153320067.767537.300180@m79g2000cwm.googlegroups.com...
> Glen M. Sizemore wrote:
>> Michael Olea wrote:
>> > Glen M. Sizemore wrote:
>> >
>> > > I looked over the paper (no, I didn't read it), and my first 
>> > > impression
>> > > is
>> > > that this is not "must" reading for behaviorists. Or rather, it is 
>> > > far
>> > > less "must" reading than some of the tutorials on Bayesian analyses 
>> > > of
>> > > coin
>> > > tosses and paper-frog jumps.  But let's cut to the quick, Joe. Why do
>> > > you
>> > > think it is "must" reading for behaviorists? Pitch me. After all, you
>> > > can
>> > > argue that I can't be persuaded, but you know that you can get a rise
>> > > out
>> > > of me.
>> >
>> > He's probably referring to the differing predictions between RW models,
>> > various Bayesian models, and the experimental results.
>> >
>> > I have mixed reactions to papers with Gopnik's name on them - 
>> > interesting
>> > work, but a tendency towards straw-man caricature of alternatives.
>> >
>> > -- Michael
>>
>> JL: I was referring to the distinction between operant conditioning, 
>> which
>> encodes relationships between the organism's behaviour and the
>> environment, and causal maps, which encode relationships between
>> aspects of the environment, where the organism may just be an observer.
>> There seems to be connection to the distinction between learning where,
>> for example, a rat lacking a hippocampus can quickly return to a
>> previously discovered object but only if he always starts from the same
>> position, and learning exhibited by intact animals who can quickly
>> locate the object from any starting point. The latter are said to
>> employ a spatial map, and Gopnick claims that causal maps are analogous
>> functions in a different domain. I wonder if the hippocampus is
>> involved in both. Maybe causal maps can be seen as a generalization of
>> spatial maps.
>>
>>
>>
>> GS: All of this stems from an inability to understand the notion of a
>> response class. Say Pigeon A has been trained to peck a key when any one 
>> of
>> three different pictures of, say, trees, are presented, and Pigeon B has
>> been trained on hundreds of such pictures (responding is never reinforced
>> when there is anything but the target pictures, and for A the S- stimuli 
>> can't
>> be trees). The pigeons appear to "have the same response class" when the
>> particular picture of a tree is one of the target pictures for A, but the
>> difference is quickly revealed when novel pictures are used for each 
>> pigeon.
>> Here, of course, Pigeon B will likely respond appropriately, but Pigeon A 
>> is
>> not. Nevertheless, both involve operant response classes, one "big" one 
>> for
>> Pigeon B, and 3 "small" ones for Pigeon A. Now, how does this apply to
>> "spatial maps"? The issue is very difficult to talk about because the
>> response classes are hard to name. Let's take an animal that has had a 
>> lot
>> of exposure to a particular environment (though keep in mind that, unless
>> special things have been done since birth, the animal will likely have 
>> moved
>> about in several different environments). Such animals have moved to a
>> variety of places in the environment from a variety of places. Further,
>> sometimes they may have moved from one object to another by a, say, 
>> L-shaped
>> route. But if the object is still in sight, the animal's approach 
>> DIRECTLY
>> back to the object may be controlled by visual stimuli - yet its return 
>> has
>> also happened in the context of the L-shaped movement through the space.
>> With enough of these sorts of occurrences, it is feasible that the animal
>> acquires a set of responses approaching different areas that are 
>> partially
>> under stimulus control of the preceding movements. Returning to places 
>> that
>> are obscured visually would establish the preceding movement stimuli as
>> powerful discriminative stimuli. Further such animals may have approached 
>> a
>> particular area from so many different places that "approach to that
>> particular area" becomes a generalized operant. Now let's talk about an
>> animal that has experience moving about several different environments. 
>> Here
>> the control by the immediately preceding movements would likely begin to
>> control behavior since the visual stimuli would not be relevant to all of
>> the environments. Such animals would be able to move about a novel
>> environment, and return to certain places without having to retrace their
>> steps. The delay between some of the movements and the response raises 
>> some
>> issues, but are hardly insurmountable from a conceptual standpoint. I am,
>> for example, a novice boater, and when I go to a large and unfamiliar 
>> lake,
>> I frequently glance in the direction of the boat dock after traveling a
>> ways, even when the dock is not really visible. The glances are mediating
>> responses, and all I have to do is "update" the mediating response
>> periodically.
>>
>>
>>
>> Now take a kid that has retrieved objects from a variety of containers. 
>> Some
>> need to be twisted, some unlatched some pushed and then turned etc. and 
>> the
>> person, thus, develops the general response of "going in to things to get
>> things" and acquires responses that may overlap different situations. Or 
>> say
>> the kid has been involved with a variety of circumstance watching falling
>> objects, thrown objects, etc. The response classes acquired when 
>> retrieving
>> such objects would produced generalized response classes that would be
>> effective if, say, the initial trajectory was observed but the landing 
>> was
>> obscured.
>>
>> >From p.11:
>>
>> JL: "Causal maps would also allow animals to extend their causal 
>> knowledge
>> and learning to a  wide variety of new kinds of causal relations, not
>> just causal relations that involve  rewards or punishments (as in
>> classical or operant conditioning), not just object  movements and
>> collisions (as in the Michottean effects), and not just events that
>> immediately result from their own actions (as in operant conditioning
>> or trial-and-error  learning). Finally, animals could combine new
>> information and prior causal information  to create new causal maps,
>> whether that prior information was hard-wired or previously  learned."
>>
>> And on p.15:
>>
>> "Just as causal maps are an interesting halfway point between
>> domain-specific and  domain-general representations, these causal
>> learning mechanisms are an interesting  halfway point between
>> classically nativist and empiricist approaches to learning.
>> Traditionally, there has been a tension between restricted and
>> domain-specific learning  mechanisms like "triggering" or
>> "parameter-setting", and very general learning  mechanisms like
>> association or conditioning.  In the first kind of mechanism, very
>> specific kinds of input trigger very highly structured representations.
>>  In the second kind  of mechanism, any kind of input can be considered,
>> and the representations simply match  the patterns in the input.  Our
>> proposal is that causal learning mechanisms transform  domain-general
>> information about patterns of events, along with other information,
>> into  constrained and highly structured representations of causal
>> relations."
>>
>>
>>
>> GS: All of the above nonsense stems from misunderstanding the generalized
>> nature of response classes, a willingness to seize on metaphor, and a
>> willingness to simply invent processes to "explain" the behavioral 
>> phenomena
>> from which the processes were inferred in the first place.
>>
>>
>
> What do you think of the experiments (p.64)?


I see nothing very surprising and not much of merit. The authors simply make 
a Cybulski-type argument because they have ignored the child's history. It 
is the same sort of thing as saying to someone "pulling the plunger 
sometimes causes quarters to drop into this cup." Now, we put the person in 
the room with the device, and the person pulls the plunger. Is the first 
pull a result of operant conditioning? Of course, but the operant 
conditioning did not take place in the setting, it took place elsewhere. It 
took place when the person's listener repertoire was produced through 
operant conditioning. What sorts of histories are necessary to produce a 
child who passes the blicket tests? What sorts of histories are necessary 
for a person to see distant objects as larger than would be predicted based 
on retinal image? What histories are necessary to produce any of the 
behavioral phenomena that we observe? That isn't the business of cognitive 
"science." The outcome of the blicket experiment depends on children having 
behavior under the control of verbal stimuli; "blickets make the machine 
 go." "Which one is the blicket?" etc., and children's behavior is under 
stimulus control of verbal stimuli because of the contingencies to which 
they have been exposed. The situation is complicated, yes. What parts of 
human behavior aren't? But outside the purview of operant conditioning? 
Nonsense.




>
> --
> Joe Legris
> 


0
Reply Glen 7/19/2006 3:20:39 PM

Curt Welch wrote:

> "feedbackdroid" <feedbackdroid@yahoo.com> wrote:
> 
>> 2. 50 years worth of physiological recordings have shown there is an
>> increasing amount of abstraction computed on the visual image as one
>> ascends the visual hierarchy. This means that loads of information is
>> thrown away in order to abstract out salient "features" present in the
>> image. Look at IT, the so-called face area. Those cells respond to
>> extreme abstractions in the visual images. The raw pixel data is
>> discarded so the cells can signal "yes, it's a face". Turn on a pixel
>> here or there in the incoming image, and the cell response isn't
>> affected.

>> That's loss of information,

None of which has the slightest bearing, of course, on whether or not the
response of simple cells in V1 acts as a lossless high resoltion buffer.
 
> Once again.  It's not a loss of information if the information is stored
> elsewhere.  If you transform pixel data into _multiple_ high level
> abstractions, there is no need for any of the information to be lost.  So,
> the simple fact that an operation like a+b is happening, is no proof that
> information is being discarded by the network.  You would have to prove
> that a corresponding feature like a-b was not being abstracted at the same
> time.  And we know for a fact that the visual system transforms the lower
> level data into multiple higher level abstractions at each step.

In fact so called simple cells in V1 occur in adjacent quadrature phase
pairs. That is, speaking metaphoricaly, for every A+B cell there is an A-B
cell next to it. More precisely, the receptive fields of the cells in the
pair cover the same portion of the visual field, have the same spatial
frequency, scale, and orientation, but differ by 90 degrees in phase. This
was first established in 1981, and confirmed many times since then. This is
precisely the condition that has to be met for Gabor wavelets to achieve
the minimum bound of simultaneous localization in both the 2D spatial
domain and the frequency domain. Of course, nobody recorded the responses
of every single simple cell in V1 in some animal to determine they all came
in pairs. Various studies sampled small patches of V1 intensively. The
conclusion is statistical. The original paper is:

Pollen DA, Ronner SF (1981) Phase relationships between adjacent cells in
the visual cortex. Science 212:1409-1411

More recently Pollen wrote:

"The conjoined optimal localization of signals in both the two-dimensional
spatial and spatial frequency domains (Daugman, 1985) is best expressed by
sets of phase-specific simple cells in V1 (Pollen and Ronner, 198l; Foster
et al., 1983). The subzones of the receptive fields of these cells are
selectively sensitive to either increments or decrements of light (Hubel
and Wiesel, 1962) and spatial processing across such receptive fields is
largely linear (Jacobson et al., 1993). The two-dimensional joint
optimalization for preferred orientation and spatial frequency in the
frequency domain and for the x and y coordinates in the spatial domain
follows from results that the largely linear receptive field line-weighting
functions of these cells are well-described as Gaussian-attenuated
sinusoids and cosinusoids (Marcelja, 1980). The Gaussian weighting renders
the signal as the most compact to specify jointly spatial frequency and
space (Gabor, 1946). The Fourier transform of these `Gabor functions' in
the space domain yields an equally compact function in the spatial
frequency domain (Gabor, 1946; Marcelja, 1980). The products of
uncertainties within the two domains approaches a theoretical minimum
(Marcelja, 1980). Simple cells with corresponding properties, at least for
analyses of brightness distributions within frontoparallel planes, are
found within both V1 and V2 (Foster et al., 1985), but not within V3A
(Gaska et al., 1988) nor apparently in V4 (Desimone and Schein, 1987)."

-- Michael


0
Reply Michael 7/19/2006 6:36:42 PM

Jim Bromer wrote:

> In my previous message I described a computer model that produced a
> stream of data which, under a variety of different conditions could set
> off alarms.  An AI program or subprogram would have the task to try to
> make predictions when and why the alarms would be set off.  There could
> be two test modes for the AI program.  In one, it would only be able to
> make observations of the streams of input data, and in the second test
> mode it would be able to interact with data environment to some extent
> by setting the values of some of the streams of input data in order to
> test its conjectures.  The AI module would have access to that input
> data streams in order to try to make its predictions, but it would not
> have access to the algorithms that produced the streams of data, and it
> would not have access to the algorithms that established the causal
> relations between the data streams or other undetectable events and the
> alarms.
 
> Suppose, for example, there were 1000 streams of simulated sensor
> readings and 100 alarms.  The streams of sensor readings range from a
> value of 0 to 100 at each sampling.  Each run of data involves some
> number of sampling time units.  Also suppose that some of the alarms
> were set by the conditions like the following.
> Alarm 4: Goes off whenever Sensor 5 is between the value of 30 and 40.
> Alarm 6: Goes off whenever Sensor 8 is between 2 and 6, or between 20
> and 25, or between 32 and 35, or between 43 and 47, or between 57 and
> 64.
> Alarm 12: Goes off whenever the average value of the 1000 Sensor
> Streams of data at any point in time is between 50 and 60.
> Alarm 23: Goes off if Sensor 3 and Sensor 4 go beyond the threshold
> value of 15 for at least 5 units of time, unless line 6 was below the
> threshold value of 40 for at least two sampling times in the 10 units
> of time before.
> Alarm 33: Goes off when Sensor 3 and Sensor 4 go beyond the threshold
> value of 15 for at least 5 units of time, or when Sensor 23 goes is
> between the value of 20 and 90, or when Sensor 45 is above the
> threshold of 80, or when Sensor 80 is below the threshold of 30.  Alarm
> 33 would therefore go off whenever Alarm 23 would go off, but it would
> also be set under other conditions as well.
> Alarm 55: An undetected event sets Alarm 55 off.
> Alarm 56: An undetected event sets Alarm 56 off if Sensor 32 is above
> the threshold value or 80.

So we have a stochastic process. In this case a discrete time series, where
the time-dependent variable is a 1100 dimensional vector. In general the
task would be to predict the future value of the vector from its past. In
this case there is a simplification since here the task is to predict the
value of a set of alarms, a 100 dimensional bit vector, from the past
values of a 1000 dimensional sensor vector. This amounts to estimatimating
the conditional distribution P(A | Si, Si-1, Si-2...), where A is the alarm
vector, Si is the value of the sensor vector at the time of the current
observation, and Si-n is the value of the sensor vector n time-steps in the
past. All but 2 components of A, as specified, are simple deterministic
functions of S. They are in fact linear threshold functions. The other two
components each depend on the values of a single hidden variable, which may
or may not be the same variable, call these hidden variables H1 and H2. The
problem description is incomplete since it does not specify whether or not
the values of the hidden variables are correlated with the values of the
observables. Only two components of A, in the problem as specified, depend
at all on the history of S, a dependence that extends at most 10 time-steps
into the past.

This is a simple estimation problem. The probability distribution, as
specified, is stationary, and belongs to the simplest complexity class of
probability distributions: those with finite predictive information. Within
that class it is a particularly simple instance since the distribution is a
composite of 0-1 distributions conditional on low dimensional linear
subspaces of the input space. The only questions of mild interest are how
much state (the history of Si) to retain (since the fact that we only need
the past 10 time steps is not something the estimator would "know" a
priori), and whether or not the existence of one or more hidden variables
can be inferred: 

A4 = f1(S5) - 2 thresholds
A6 = f2(S8) - 10 thresholds
A12 = f3(L1Norm(S)) - 2 thresholds
A23 = f4(S3, S4, S6, t) - 6 thresholds
A33 = f5(S3, S4, S23, S45, S80, t) - 8 thresholds
A55 = f6(H1) = P(H1) (which is unspecified)
A56 = f7(S32, H2) = P(H2 | S32 > 80) (which is unspecified)

The optimal Bayesian estimator for this problem comes from the simplest
family of such estimators - the family of finite parametric models. How
quickly (i.e. after how many observations of (S,A)) such an estimator
converges on the optimal prediction hypothesis depends both on the joint
distribution P(S,H1,H2) and the capacity of the hypothesis space of the
estimator. Once the capacity is high enough to include the correct
hypothesis, the greater the capacity, the slower the convergence. In this
case, leaving open for the moment the question of P(H1) and P(H2 | S32 >
80), the optimal hypothesis is a linear function of the vector S and its
recent past. There is, of course, a variety of ways to treat the dependence
on the past. One way is to set a fixed threshold of how far back to look. A
less arbitrary approach is to use a decay function, like an exponential,
that gives progressively less weight to events further in the past. This
amounts to a prior over the correlation horizon of the stochastic process.

That leaves the estimation of P(H1) and P(H2). There is little to say about
this, given the description of the problem, other than that this is a
distribution estimation problem.

-- Michael

0
Reply Michael 7/19/2006 11:14:01 PM

"Glen M. Sizemore" <gmsizemore2@yahoo.com> wrote in message 
news:44bd40e8$0$2513$ed362ca5@nr1.newsreader.com...
>
> "feedbackdroid" <feedbackdroid@yahoo.com> wrote in message
....
>> Congrats. I see you're starting to emulate your new best friend, GS.
>
> I don't think so. When I thought I had a chance to be best friends with 
> Michael, I sent him the following email:
>
>
>
> Dear Michael,
>
>            I like you. Do you like me? I would like to be your best 
> friend. Do you want to be best friends? Mark one:
>
>
>
>
>
>
>
> Yes________                                               NO_________
>
>
>
>
>
>
>
> He didn't return my email. A reasonable conclusion is, therefore, that he 
> came to the conclusion that you are intellectually dishonest, and stupid, 
> on his own. I, however, think you are an order of magnitude more dishonest 
> than you are stupid. But, then, I read a lot of people's posts here and, 
> thus, I tend to think of "stupid" as being sort of calibrated on Verhey. 
> You are not as stupid as Verhey.

Always love to hear from you, Brother Darth. From your mouth all shit is an 
honor... how you do it?? 


0
Reply JPl 7/20/2006 5:29:47 PM

Michael Olea wrote:

>
> This is a simple estimation problem. The probability distribution, as
> specified, is stationary, and belongs to the simplest complexity class of
> probability distributions: those with finite predictive information.

> That leaves the estimation of P(H1) and P(H2). There is little to say about
> this, given the description of the problem, other than that this is a
> distribution estimation problem.
>
> -- Michael

I will try to follow up on the terms that you used in this message, but
I was not talking about the "simple estimation problem," that you
described.

I am not sure why you have not been able (or willing) to understand
what I have been trying to say during the last year, but maybe that is
just the way its supposed to be.  However,  I sincerely appreciate your
sharing of your knowledge of probability distributions and Bayesian
methods.
Jim Bromer

0
Reply Jim 7/20/2006 5:42:48 PM

Jim Bromer wrote:

> Michael Olea wrote:

>> This is a simple estimation problem. The probability distribution, as
>> specified, is stationary, and belongs to the simplest complexity class of
>> probability distributions: those with finite predictive information.

>> That leaves the estimation of P(H1) and P(H2). There is little to say
>> about this, given the description of the problem, other than that this is
>> a distribution estimation problem.

> 
> I will try to follow up on the terms that you used in this message, but
> I was not talking about the "simple estimation problem," that you
> described.

I thought you posed two problems:

1) predict when the alarms go off by observing the data stream
2) predict when the alarms go off by oserving the data stream and conducting
some experiments (e.g. setting the values of some sensors and observing the
effect).

I was addressing the former. Was that not one of two problems you posed?

-- Michael

0
Reply Michael 7/20/2006 6:12:52 PM

Hello Jim,
now I will try again to post:

I have also a good reasons that gradual learning is neccessary:
Imagine a computational System (lets call it "SIP"). SIP shall have an
input where you can type a single char or a word and an output where
SIP can produce a single char or a word. What words SIP does or does
not say shall depend on SIPs experiences.
Experience shall mean the following: At first SIP produces output
randomly, only single chars. There shall be a global parameter that
influences if SIP will do the output again in future or if SIP will not
do it again. This parameter must be set by the environment, e.g. a
human being, after SIP has said something. I think this is not too
difficult to program with a neuronal network. Even if SIPs behaviour
should depend on the input.
It is simple behaviourism. The human can control the behaviour of SIP
over the global parameter. In other words he controls what SIP is
learning.

But how difficult gets it if we want to teach SIP to say certain words?
Imagine a word with 5 chars. There are 26*26*26*26*26 ways to build a
word with 5 chars. SIP will never say the word we want him to say. So
how can we award him via the global parameter? The precondition for
conditioning is that the System does at least once the right thing.
Even a word with 5 chars is so complex that it will never be said with
random actions.

This is no computational problem. Every learning System without
predefined actions has this problem. The bootstrapping of the actions
has to be randomly. In other words: Children have to play randomly.

So how can SIP learn to say words? The solution will be gradual
learning. But gradual learning digs us deeper in the complexity of
learning. SIP needs the following:
- An ideal of how his action should look like. This ideal must depend
on his perceptions
- A sense to percept his own actions
- A mechanism to compare the ideal and the perception. If an action is
nearer to the ideal  than earlier actions, then the action is preferred
in the future.

I think most AI-Guys will not agree with this, because it makes things
difficult. But learning is more difficult than behaviourism. If you
agree with me, lets talk about how gradual learning can be realized
with a neuronal network!

Lars

0
Reply Lars 7/25/2006 11:13:00 AM

Hello Jim,
here is the answer to your question you mailed to me:

What is my paradigm?
I simply have none. But there are some things I am very sure about:

- The most important property of intelligence is not the ability to do
logical operations. Most humans do never reach the point to be familiar
with logical operations. It seems to be quite hard for human beings to
learn logical operations on an abstract level.
- The most important property of Intelligence is the ability to learn
all the time, not only during an special initialization-phase.
- With the classic AI-Design "Input-Processing-Output" you can
initialize a neural network so that it reacts in a special way. But
such a behaviouristic system will never get intelligent. An intelligent
system must be able to act not only to react.
- Every learning system must bootstrap with random-actions - like a
little baby.
- A system with behaviouristic learning will never be able to learn
complex actions, e.g. an output of a special word in a special
situation.
- For complex actions gradual learning is necessary. (With "learning"
most people mean something behaviouristic like action and punishment.
Gradual learning means to set an ideal and many tries to reach the
ideal. Of course these are very different things!)

What is the purpose of SIP?
Sometimes when I do not see any problems anymore to program a learning
system, I continue my SIP-programm until I see there are still
problems. Of course, I had never really believed there are no problems
anymore, but thinking about SIP and programming SIP always gives me
more concrete and deeper insight to the problems.
SIP should be the simplest kind of a learning system, that is not
designed to learn a special ability. What does it need at least?
- An input-textfield (=perception)
- An output-textfield (=action),
- Neurons with certain paramters and rules to connect to each other
- one (or probably more) global parameter that influences learning;
there must be a possibility to influence this parameter by a human
being (like food)

One first aim will be to teach him to say certain words in some
situations and other words in other situations. But what is the
simplest system that can do this? - Because of the complexity of the
output (as Jim said) you need gradual learning, that means the ability
to set an ideal and so on...
So SIP (=Simplest Intelligent Program) is a helper for me to find out
what is the minimum that is needed for an intellingent(learning)
system. My thoughts are more in a philosophical than technical layer if
this can be devided. But maybe one day SIP will run...

0
Reply Lars 7/25/2006 12:40:42 PM

"Lars" <LarsFiedler@gmx.de> wrote:
> Hello Jim,
> here is the answer to your question you mailed to me:
>
> What is my paradigm?
> I simply have none. But there are some things I am very sure about:
>
> - The most important property of intelligence is not the ability to do
> logical operations. Most humans do never reach the point to be familiar
> with logical operations. It seems to be quite hard for human beings to
> learn logical operations on an abstract level.

Yes, I see logic as a learned verbal behavior (like all of math) which has
become a useful behavior technique for many areas of our lives, but which
is not the real foundation of what we are.  I'm not saying that we can't
use math to describe our foundation (I think we will be able to), just that
our high level math behavior (and closely connected reasoning behaviors)
are not the foundation of our intelligent behavior.  Like learning to turn
a lid counter-clockwise to get it open, our logic skills are just a set of
"tricks" we have learned which are useful in our life.

> - The most important property of Intelligence is the ability to learn
> all the time, not only during an special initialization-phase.

Yes.  The machine must learn and operate at the same time - any system that
requires a training process separate from the operation is just the wrong
type of system.

> - With the classic AI-Design "Input-Processing-Output" you can
> initialize a neural network so that it reacts in a special way. But
> such a behaviouristic system will never get intelligent. An intelligent
> system must be able to act not only to react.

To make it act, you can just add feedback to a temporal reaction system.
That way, all its self generated actions can be create as reactions to its
own actions.  This is how you create a central pattern generator from a
reaction machine.  I suspect this is exactly what the human motor cortex it
- it's nothing more than a sensory cortex which is sensing the cortex
outputs (most people describe it as a device which "creates" the outputs -
but I suspect that's wrong).

> - Every learning system must bootstrap with random-actions - like a
> little baby.

I've come to the conclusion that even though this idea is correct, the
randomness must not come from some unpredictable source of random data
(like a random function in software).  Instead, what looks like random
behavior to us, should be created by a set of very deterministic reactions.
The starting or default configuration of the reaction learning machine
should be one which produces maximal randomness.  This way, the machine is
not learning how to react - it's built to do nothing but react.  It's only
job is to figure out which reactions work the best.

> - A system with behaviouristic learning will never be able to learn
> complex actions, e.g. an output of a special word in a special
> situation.

I don't agree with that.  I wrote a reply to the other message talking
about this issue but I've not posted it yet - but will soon.

> - For complex actions gradual learning is necessary. (With "learning"
> most people mean something behaviouristic like action and punishment.
> Gradual learning means to set an ideal and many tries to reach the
> ideal. Of course these are very different things!)

How, and why, does the system pick one goal (aka ideal) over another?

It seems to me you simply favor the idea of a goal directed learning
machine which is constantly trying to reduce the errors between it's
current actions and it's goal.  But how does such a machine pick the goals?
How does it create sub-goals from the prime goals?  How does it know when a
goal should be changed?  Of all the goals it might have, which would it be
trying to reach at any moment in time?

The one thing I also list in my ideas which you didn't, is the power to
learn real time behaviors - that is, behaviors based on timing - such as
what is required to learn to sing a song, clap your hands in time, or walk,
or catch a ball, or speak so that people can understand you (not too fast,
not too slow, with the correct spacing between words etc), or throw a rock
and hit a moving target, etc.

It might be possible to build an intelligent language based machine which
is not real-time based like this, but humans clearly are very much
time-based learning machines.  So you might be able to ignore this
requirement in your SIP.

> What is the purpose of SIP?
> Sometimes when I do not see any problems anymore to program a learning
> system, I continue my SIP-programm until I see there are still
> problems.

Yeah, that's how we all move forward with our ideas.

> Of course, I had never really believed there are no problems
> anymore, but thinking about SIP and programming SIP always gives me
> more concrete and deeper insight to the problems.
> SIP should be the simplest kind of a learning system, that is not
> designed to learn a special ability. What does it need at least?
> - An input-textfield (=perception)
> - An output-textfield (=action),
> - Neurons with certain paramters and rules to connect to each other
> - one (or probably more) global parameter that influences learning;
> there must be a possibility to influence this parameter by a human
> being (like food)
>
> One first aim will be to teach him to say certain words in some
> situations and other words in other situations. But what is the
> simplest system that can do this? - Because of the complexity of the
> output (as Jim said) you need gradual learning, that means the ability
> to set an ideal and so on...
> So SIP (=Simplest Intelligent Program) is a helper for me to find out
> what is the minimum that is needed for an intellingent(learning)
> system. My thoughts are more in a philosophical than technical layer if
> this can be devided. But maybe one day SIP will run...

Sounds like the general approach I've been looking at as well - trying to
understand what is the minimal needed to create an "intelligent" program
and constantly looking at different ways to approach or answer that
question.

I think we have a very similar list of ideas, with the only major
difference being your desire to use this goal directed approach (if I'm
understanding your use of the the "ideal" concept correctly), instead of a
reinforcement learning approach.  I think the idea of goals is much like
the idea of using logic as the foundation of Intelligence.  Goals are
something humans deal with as a high level "trick" for directing our
actions, but the underlying foundation which the brain uses is less direct.
I think all the behavior that looks goal directed, was actually created by
a reinforcement learning machine.  But that the underlying true goal is
simply to maximise reward.  But, to do a good job of this, the machine must
have a system of creating secondary reinforcers (which is just a fancy way
of saying the brain creates sub-goals).  And this technique of creating
secondary reinforcers to improve the reinforcement learning process, is
what creates all our behavior that looks so "goal" directed.  I'll explain
more of this in my follow-up to your other message.

But the bottom line is that I agree, we need goal-directed actions, but
they happen as a natural result of reinforcement learning.

Your problem, if you look at it as a system which is creating an idea, is
the questions I asked above.  How does it pick the ideal?  How does it
create sub-ideals?  How does it deal with multiple ideals at the same time?
How does it pick the correct ideal, to be using at the moment to guide its
actions? How does it know when it needs to modify an ideal?  Looking at the
problem as a reinforcement learning machine which forms secondary
reinforcers for all actions is one way to answer those questions.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/26/2006 2:27:19 AM

"Lars" <LarsFiedler@gmx.de> wrote:

> But how difficult gets it if we want to teach SIP to say certain words?
> Imagine a word with 5 chars. There are 26*26*26*26*26 ways to build a
> word with 5 chars. SIP will never say the word we want him to say. So
> how can we award him via the global parameter? The precondition for
> conditioning is that the System does at least once the right thing.
> Even a word with 5 chars is so complex that it will never be said with
> random actions.

Never say never! :)  It's guaranteed to happen if the behavior is actually
random.  It's only a question of how long it might take.  If the system
speaks one word a second then the expected value of the time it would take
is less than 6 months.  6 months is a long time, but it's a lot better than
never.  The important thing to understand is that these things, even though
they are highly unlikely, will always happen, if given enough time.  To
turn it from impractically long, to reasonable long, you just have to find
ways to speed it up.

For reinforcement learning to work well, the environment must structured so
that the learning machine gets lots of hints.  The rewards must create a
trail of bread-crumbs for the machine to follow. You can't wait for 6
months for it to speak the first 5 letter word correctly before it gets its
first reward unless you want to wait 100 years for it be learn something
interesting.

So, for your problem of trying to get the machine to learn to speak a word
like HORSE, you can help it along by giving it a reward which is relative
to how close it is to the correct answer.  If it generates XUTRIDKRHGFR you
might give it a reward near 0.  But if it generates OSHXE you might give it
a larger reward for both generating a 5 character word, and for using some
of the right letters.  Now, if every word it generates, is given some level
of reward, relative to how close it is to producing the correct answer,
then the system can start to shape its selection of letters (it's selection
of behaviors) based on which combinations have produced the most reward
over time.  So, not only will it be getting a constant stream of hints from
the environment with every word it generates (so it doesn't have to wait 6
months for the first hint) it will also be constantly improving the quality
of it's guesses with every hint it gets.  This will cause it to converge on
the correct answer exponentially.  So instead of taking 6 months, it may
only take a day.  This is how reinforcement learning becomes workable.

In humans we see the same thing happening.  A baby has no clue how to grab
its bottle and stick it in its mouth at birth. The baby has no clue what a
bottle is or that the bottle is something it might want to stick into its
mouth.  If the baby didn't have any hints from the environment to guide it,
it might take 100 years before the baby randomly happened to stick the
bottle in its mouth the first time and find out how good it was (get its
first reward from the food in its mouth).

But we have friendly humans helping by sticking the bottle in our mouth for
us.  So we get the reward of the food without having to do anything on our
own.  But, we must still learn to hold the bottle for ourselves, and to
grab it and stick it in our own mouth. Doing this by random chance might
take years.  But the environment gives us lots of "hints" so we don't have
to wait that long.  If we don't hold the bottle, it falls out of our mouth
this acts as a punishment.  If it happens as we straighten out our arm for
example, that act is punished.  We are less likely to do that while sucking
a bottle in the future.  Slowly, behavior by behavior, we learn what to do,
by process of elimination.  All that different behaviors that allow us to
keep getting food from the bottle is getting rewarded, and anything we do
which stops the food, gets punished.  After basic "holding in mouth" skills
get developed, we advance to "pushing bottle back in mouth when it slips
out".  Then later, we advance to, "picking up bottle and sticking it in our
mouth".  All this was learned one small step at a time because the complex
environment naturally rewarded, or punished, all these little micro
behaviors.  This is because the environment was not attempting to reward
the "right behavior".  It instead, is configured to reward or punish only
the result.  If the food got into the mouth, we are rewarded, no matter how
it happens, and if the food is taken out of our mouth, the behaviors are
punished - even if it happens because mom walked up and took it out of our
mouth.

We didnt have to wait 100 years for this very complex sequence of behaviors
required to reach out, grab the bottle, and bring it to our mouth to happen
randomly before we got the first reward.  The complex environment instead
was full of little hints that worked like a trail of bread crumbs to lead
is slowly, only micro-behavior at a time, to these complex behaviors.

Are whole life works like this.  We learn things step by step, very slowly,
through experience.  All our old skills, are used to guide our behaviors so
that each new skill is only has to be one more small step - no need to wait
6 months for a complex sequence to happen before we can learn it.

> This is no computational problem. Every learning System without
> predefined actions has this problem. The bootstrapping of the actions
> has to be randomly. In other words: Children have to play randomly.
>
> So how can SIP learn to say words? The solution will be gradual
> learning. But gradual learning digs us deeper in the complexity of
> learning. SIP needs the following:
> - An ideal of how his action should look like. This ideal must depend
> on his perceptions
> - A sense to percept his own actions
> - A mechanism to compare the ideal and the perception. If an action is
> nearer to the ideal  than earlier actions, then the action is preferred
> in the future.

But, how does the system pick an ideal (aka goal)?  Why for example would a
baby form the ideal of "sticking bottle back in mouth when it falls out"?
How would it form this ideal when it's something it has never every seen
happen (aka, it's never sensed itself stick it's own bottle back in it's
own mouth).  How would it know that this might a good thing to try and do?
Why wouldn't it instead form the idea of throwing the bottle out the window
and work hard to learn to do that?  If it's thrown a bottle out the window,
or stuck a bottle in it's own mouth?

And, if you play off the concept of a baby learning to mimic others (we
clearly learn a lot of things that way), how do you explain creativity
where we do something that no one has ever done before?  How can mimicking
be the basis of our learning when we are able to learn things no one has
ever done before?

> I think most AI-Guys will not agree with this, because it makes things
> difficult. But learning is more difficult than behaviourism. If you
> agree with me, lets talk about how gradual learning can be realized
> with a neuronal network!
>
> Lars

In the other message, I talked about the how strong reinforcement learning
systems use secondary reinforcers in the place of your "ideals".  Let me
explain that a little better so you can understand what I'm talking about
(if you don't already).

The role of secondary reinforcers is what most people fail to understand
and which is what leads them to believe "simple behaviorism" fails to have
the needed explanatory power to explain human learning.  It's what makes
them learn towards the idea of goal directed behavior.  However, goal
directed behavior is just the layman's way of trying to talk about
secondary reinforcers.

The problem of trying to use goal directed behavior as the foundation of
complex human behavior is that the approach solves nothing.  It simply
pushes the problem down a layer.  We start with the question of how do we
develop complex behaviors.  If the answer is that we set a goal for
ourselves, and then develop the behavior by slowly (gradually) reducing
error between our current behavior and the goal, then that at first might
seem like we have made some progress on answering the question of where the
behavior comes from.  The idea of improving behavior by reducing error
seems very workable.

But, now we have created a new problem to answer.  Where does the goal come
from?  And if you think about this, you realize that this new problem, is
no different than the first one we started with.  It's the exact same
problem we started with - the problem of how the machine selects which of a
billions of behaviors, is the right behavior to perform now.

Reinforcement learning, is a system that does actually answer the question.
It answers it by first defining a goal not in absolute terms of behavior,
but in terms of results.  A Baby is rewarded for getting food in its mouth,
but the behavior needed to make that happen is unknown to baby, or to the
reward system.  This is key - you don't reward the correct behaviors you
reward the correct results.  So the goal is not the behavior, but the
results.  Evolution motivated babies to get food in their mouths - but
doesn't care how it does it - as a result, we learn the complex sequence of
behavior of grabbing things and sticking them in our mouth with our hands
(or any other way we learn that works to get things in our mouth).

Secondary reinforcement, then turns out to be the system that creates
sub-goals - but once again, they are not behavior goals, but result goals.

When we are rewarded for achieving a primary result (getting good tasting
stuff in our mouth), the brain will remember everything that it is sensing
at that time.  It will create a correlation between, what it is sensing,
and the reward that it received at the time.

If a baby sees it's bottle, every time it's gets a food reward, then the
sensation of that bottle alone, will become correlated with the reward.
The bottle itself, will become a secondary reinforcer.  It will act like a
reward, just like the food did - though not as strong.

So now that the brain has learned that the vision of the bottle is a
sensation that acts as a secondary reinforcer, the brain has a new result
that it can be rewarded for.  Any time it does something, which leads to
the sensation of "bottle", the brain is rewarded for that result - it's
rewarded for making the "bottle" sensation happen.

If the bottle is to the right, and the baby turns it's head to the left so
it can no longer see the bottle, the baby is punished for making the bottle
sensation go away.  If the head is turned back to the right, to behavior is
rewarded for making the bottle sensation come back.  This teaches the
reinforcement learning system to track the "good" things with its eyes.

If the bottle is hidden under a blanket, and the baby just happens to move
the blanket, and expose the bottle, that behavior gets rewarded.  The baby
learns to move the cover to find "good things".

In the end, the system which defines secondary reinforcers is assigning a
"goodness" value to everything.  All these values are based on the
correlation between these sensations, and the primary rewards which are
hard-coded into the reward generation system.

So, when a baby turns it's head to see a bottle, it might be getting an
effective reward for finding the bottle, but at the same time, it might be
punished, for turning it's head away from something else it likes.  So the
environment ends up being a very complex set of values which are constantly
completing with each other to shape our behaviors.

This is also how reinforcement learning, which someone may think of as not
"constant" learning, becomes constant non-stop learning.  Everything we
sense, has associated values as assigned by the system for defining the
secondary reinforcement values.  As we interact with the environment, all
our behaviors are constantly being rewarded and punished.  Babies don't
have to wait until their next get a real reward from getting food in it's
mouth, all the secondary reinforcers that have been trained act to teach it
how do things like uncover a bottle hidden under the blanket - or take or
thumb out of our mouth, so that we can stick the more valuable bottle in
our mouth.

This large and complex system of secondary reinforcers, acts as our
secondary goals in life - but the goals are not behaviors, they are all
results. Our goal is to discover which behaviors, will best maximise all
our rewards, including the rewards from our secondary reinforcers.

So, to your first point about how we learn to produce a task which is too
complex to expect it to happen randomly in a reasonable amount of time, is
that we learn these complex tasks, by taking baby steps, and being reward
for each baby step that gets us closer to the complex task.  The rewards
that are most useful for this system, are the ones that allow us to get
partial rewards for getting "closer" to a good solution.  If we are not
rewarded until we get 100% on the test, learning will be very slow.  But if
we get partial rewards for each answer we got right, learning will be much
faster.

And to help with the "partial rewards" problem, if we use a system of
secondary reinforcers which act as reward estimators, then everything we do
will be partially rewarded based on how that behavior changed the current
reward estimation.  It will naturally guide us to more real rewards,
because we are constantly following the hints created by the reward
estimation system.

For your system where you want it to learn language, you need to pick a
more indirect result to reward it for.  You don't want to have to use a
human to directly reward it (by pushing a button) every time it correctly
uses language.  You want to give it a more indirect goal that creates the
need for it to learn to use language.  What I've considered doing, but
never actually tried to make work, is to motivate the system to keep a
person talking to you.  So, you do something like give it rewards as long
as it can keep someone talking to it, and punish it, every time the person
gives up and leaves.

I have thought for example of setting it up on the Internet, and let anyone
that wants to chat with it.  But you would have to do something to connect
to it, and it would disconnect if you left.  The goal of the machine would
be to keep people talking to it as long as possible.  The goal would have
nothing directly to do with learning valid language.  If the machine just
spits out random shit like, "XDFGRWEKHJVJBVE kJGHEFgfuER IEY feigherf"
people will give up and leave very quickly. But as that random shit, starts
to look a little bit more like something interesting, the people might stay
around just a little bit longer.  This I think could in time, train the
machine to spit out stuff that people find more interesting, and more
appropriate relative to whatever they were typing to the machine.  The
trick would be finding enough people to spend a few minutes talking to a
machine too stupid to produce even a single valid word. :)  But hey, that
shouldn't be hard on the Internet, just set up a free-sex web site and let
them chat with "Deep Thought" for free. :)

You could measure the "intelligence" of different AI algorithms in this
type system by seeing which algorithms managed to get the most rewards and
our measure it's rate of improvement.

That's how I see AI working....

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 7/26/2006 4:40:43 AM

"Lars" <LarsFiedler@gmx.de> wrote in message 
news:1153825980.703031.254040@m79g2000cwm.googlegroups.com...
> Hello Jim,
> now I will try again to post:
>
> I have also a good reasons that gradual learning is neccessary:
> Imagine a computational System (lets call it "SIP"). SIP shall have an
> input where you can type a single char or a word and an output where
> SIP can produce a single char or a word. What words SIP does or does
> not say shall depend on SIPs experiences.
> Experience shall mean the following: At first SIP produces output
> randomly, only single chars. There shall be a global parameter that
> influences if SIP will do the output again in future or if SIP will not
> do it again. This parameter must be set by the environment, e.g. a
> human being, after SIP has said something. I think this is not too
> difficult to program with a neuronal network. Even if SIPs behaviour
> should depend on the input.
> It is simple behaviourism. The human can control the behaviour of SIP
> over the global parameter. In other words he controls what SIP is
> learning.
>
> But how difficult gets it if we want to teach SIP to say certain words?
> Imagine a word with 5 chars. There are 26*26*26*26*26 ways to build a
> word with 5 chars. SIP will never say the word we want him to say. So
> how can we award him via the global parameter? The precondition for
> conditioning is that the System does at least once the right thing.
> Even a word with 5 chars is so complex that it will never be said with
> random actions.

Yeah, but what about if the net randomly generates output with varying 
character lengths? And what if "reinforcing" a particular instance 
strengthens both the whole output and also subsections? And what if the 
human reinforces anything that is pronounceable, even if it is not a real 
word? Like "levkul" or "gretnef"? And what if portions of the out put can 
reappear alone or in combination with other fragments like "lev" and "gre" 
etc. You have set up a straw man.



>
> This is no computational problem. Every learning System without
> predefined actions has this problem. The bootstrapping of the actions
> has to be randomly. In other words: Children have to play randomly.
>
> So how can SIP learn to say words? The solution will be gradual
> learning. But gradual learning digs us deeper in the complexity of
> learning. SIP needs the following:
> - An ideal of how his action should look like. This ideal must depend
> on his perceptions
> - A sense to percept his own actions
> - A mechanism to compare the ideal and the perception. If an action is
> nearer to the ideal  than earlier actions, then the action is preferred
> in the future.
>
> I think most AI-Guys will not agree with this, because it makes things
> difficult. But learning is more difficult than behaviourism. If you
> agree with me, lets talk about how gradual learning can be realized
> with a neuronal network!
>
> Lars
> 


0
Reply Glen 7/26/2006 3:14:14 PM

Hello Curt,

you are absolutely right - using a new term ("goal") has no benefit. It
is just another description of a problem. And you ask the right
questions, that I also ask myself:

> It seems to me you simply favor the idea of a goal directed learning
> machine which is constantly trying to reduce the errors between it's
> current actions and it's goal.  But how does such a machine pick the goals?
> How does it create sub-goals from the prime goals?  How does it know when a
> goal should be changed?  Of all the goals it might have, which would it be
> trying to reach at any moment in time?

So I will try to answer some of these questions. I am not sure about my
answers. And some answers will lead to further questions. But I think
it will lead us the right way. Of course this will be boring to you,
because you think goals can be realized as "reinforcement learning". So
I will explain afterwards what I am missing in your examples of
reinforcement learning.

I prefer the term "ideal" because it implies that there is a process of
getting nearer to the ideal. Maybe an ideal is a special kind of goal.

1.) How could an ideal be realized?
-----------------------------------------------------------
There are two physiological facts that inspired me:
- When a human being waits to do a simple action, e.g. pressing a
button when a certain signal appears, there are neurons in the
prefrontal cortex that fire steadily. It seems the human brain has
tension. Maybe this is a kind of goal or willing or whatever you call
it.
- There are steadily firing neurons in the brainstem, that seem to
represent the state of the body. If they do not fire anymore (because
of a damage) the human being looses consciousness or is not awake
anymore. It seems the brainstem is like a constant fire that keeps our
brain in action. And maybe it leads us to actions that keep our body
alive and in homeostasis. (Antonio R. Damasio described this)

This leads me to the following ideas:
- An ideal must be kept up for a while. So maybe an ideal can be
realized as constantly firing neurons.
- An ideal must depend on the needs of the body. So maybe the needs of
the body is set by the constantly firing neurons that represent the
body. If the state of the body changes the ideal changes.
- A constantly firing body is a very different approach than the
input-process-output-Model. A system with a constantly firing body will
act - not only react.
- I think human brains are not only designed to keep the body alive,
which means to get food and so on. I think there must be a design that
gives us pleasure that has only an intellectual reason not a bodily
reason. Such a design makes us happy if we predict something. With
"predict" I mean that the human being has an expectation (similar
realized as an ideal) that comes true. E.g. a child presses a
lightswitch and is happy that the light goes on - what he expected. But
let us keep things as simple as possible und let us see how far we come
with "only" food.
- "Subgoals": I do not know yet. Maybe somehow ideals can be
agglomerated to one ideal and the one ideal can be devided into his
elements as some chars can be agglomerated to a word and the word can
be devided into chars. (Ok, this idea is confusing and not on the
design layer.)
- Economy: There is a general problem in neural networks. How can we
achieve that not all neurons fire at once and that at least one neuron
fires. I call this the "problem of economy". Maybe the solution is
something like this:
There is an special area that usally does not fire emedialty to other
areas. And there is a global parameter that rises during a second. This
parameter supports the neurons until one representation is strong
enough to fire into another area and e.g. cause an action. This
mechanism could be something human as we say sometimes: "Just a second,
it comes, I will remember it!". I have not heard of such a parameter in
human brains. But this problem cannot be solved functional. Funtional
would mean that a set of neurons must inhibit all other neurons. This
design would need to many connections.
- There is another problem (that you did not mention): I call it the
"problem of sharpness". I think we agree that in the human brain a set
of neurons represents a certain perception. The neurons in the brain
always have an actual state, that were set from previous perceptions.
And maybe at some points different kinds of perceptions (seeing,
hearing) collide. But we always have one thought - which is sharp but
not exactly sharp. - I have no general solution to this problem yet.

Why all this difficult stuff, when there is the easier solution of
reinforcement learning?

2.) Why is an ideal necessary?
Curt, if I misunderstood you, please tell me where to find a
description of your system at the design layer. I have not read all the
"DOHs" in this thread :-). As I understood it you think about a system
that has the input-processing-output-model - maybe with drawback loops
but no constantly firing neurons as I described above.

I ask you similar questions that you have already answered at the
pychological layer but not at the design layer:
- How does such a machine pick the estimation rules? - One rule could
have the aim to reproduce the input as an output like a parrot. But
there must be other estimation rules for an intelligent system.
- How does it know when the estimation rule should be changed?
- Of all the estimation rules it might have, which would it be trying
to reach at any moment in time?
- How does it create sub-estimation-rules from the prime estimation
rules?

Or to stay at the HORSE-example:
Let us asume the system says "HOXEL" and is rewarded. There must be 2
rules that lead the system nearer to "HORSE":
1. Rule: The system must say words that are similar to "HOXEL". So a
rule must define what means "similar". Maybe this could be achieved by
a neural network with a bit unsharp actions.
2. Rule: The system must have the estimation rule "parrot" at the
moment it says "HOXEL".  This is an artificial rule, that has no
counterpart in the human brain. It is built on top of the software of
the neural network.

I do not think artificial rules will bring us any further, because it
restricts the system to learn something special.

0
Reply Lars 7/27/2006 3:18:12 PM

Lars wrote:
[..]
> I do not think artificial rules will bring us any further, because it
> restricts the system to learn something special.


However, natural systems (animals, including us) in fact have "special 
rules". They are the inbuilt behaviours. You seem to believe that 
"learning" means acquiring behaviours the system has never exhibited 
before. IOW, you appear to believe in some sort of tabula rasa.

What you seem to be forgetting is that all learning starts with 
spontaneous behaviours. These are not "random". They are produced by the 
animal's physiology - its neurology, its skeletal-muscular system, it's 
endocrine system, etc. You can't teach a pig to fly because a pig lacks 
the physiology needed for flying. You can teach a crow to fly to some 
indicated place because it can fly, period. Neither animal can be taught 
to sing, because their vocal apparatuses can't produce the pure tones we 
identify with that skill. Both can be taught to vocalise in response to 
cues and signals produced by their trainer or appearing in the 
environment - ie, both could be taught to be an excellent intruder 
warning system. All this is obvious, but many critics of reinforcement 
learning forget the obvious.

You use learning HORSE as an example, and posit HOXEL as an intermediate 
stage, with HORSE as an "ideal" to be achieved.

Firstly, the system must be capable of "producing language", else it 
cannot learn the semantic difference between HORSE and HOTEL, both of 
which are (relatively) easy consequences of HOXEL. But "producing 
language" is not at all well understood, despite many decades of 
intensive research (and recent confident claims of explication by such 
as Pinker notwithstanding.) Thus, building a system that can learn HORSE 
(and HOTEL, etc) requires knowledge and concepts we do not IMO as yet have.

Note that the semantic difference between HORSE and HOTEL actually 
appears as differences in context. That is, you have "understood" both 
HORSE and HOTEL when you include these words in your language responses 
to different combinations of cues in your environment. These 
combinations are so complex, and vary in so many subtle ways, that it is 
not at all easy to to describe them. IMO, a complete description of 
impossible. This difficulty sheds on light on why people respond 
differently to the same texts, for example. Each person discriminates 
different cues in the complex of cues that make up a text, each person 
brings additional cues, such as their past experience, to the task, and 
so on.

OTOH, building systems that can learn to differentiate between HORSE and 
HOTEL as symbol strings in text is much simpler - various statistical 
methods point the way, as Olea's threads illustrate. For that matter, a 
good spellchecker illustrates how statistics can be used for some 
language related tasks. But that's a long way from "learning", gradual 
or otherwise.

Secondly, observation of children learning language shows that it is 
their built-in language behaviours that are shaped to produce the 
specific language of their community. Babies are born with many 
abilities required for this. Two are: A) "babbling", ie, the ability to 
produce the sounds of language.  B) preferential attention to language 
sounds when presented with sequences that include random noises. Deaf 
children are capable of learning a visual language. Blind people can 
learn to read tactile symbols. So there are clearly many other abilities 
(==behaviours) involved in learning a language. One of these must be the 
ability to string hehaviours together in sequences, and delay responses 
to observed (heard, read, seen, felt) sequences of cues ("symbols") 
until the sequence is complete -- which implies the ability to 
discriminate between complete and incomplete sequences. (It's at this 
stage, BTW, that the "marking time" neuronal firings in the cortex etc, 
as noted by you, are certainly involved.) Any language learning system 
of the kind you appear to have in mind will have to be capable of at 
least these behaviours.

Your notion of "gradual learning", insofar as it makes sense, seems to 
me a poorly understood version of reinforcement learning. It neither 
refutes nor replaces the notion of reinforcement.

HTH
0
Reply Wolf 7/28/2006 1:31:27 PM

Wolf Kirchmeir wrote:
> Lars wrote:
> [..]
> > I do not think artificial rules will bring us any further, because it
> > restricts the system to learn something special.
>
>
> However, natural systems (animals, including us) in fact have "special
> rules". They are the inbuilt behaviours. You seem to believe that
> "learning" means acquiring behaviours the system has never exhibited
> before. IOW, you appear to believe in some sort of tabula rasa.

I think the tabula rasa approach is exactly what Curt imagines could
take place in a RL machine with the right kind of "power".  That he
could evolve these inbuilt primary behaviors just as evolution did.

--
JC

0
Reply JGCASEY 7/28/2006 6:22:24 PM

I wonder a bit why some people see "gradual learning" and
"reinforcement learning" as a contradiction. Of course gradual learning
is based on a kind of reinforcement learning. Learning must always be
based on an biological aim, e.g. keeping the homeostasis. So learning
means to encourage actions that bring benefit to the body. So learning
means to built and change the connections between the following
(depending on the benefit the action causes):
1. situation of the creature (his internal states, perceptions)
2. action

But there is no inner logic of the situation and the best action. The
cohesion of situation, action, benefit can only be built when they are
timely contingent.
[Even every cause is built timely contingent. "If we unhand a stone it
always falls down" was a true sentence in the sense that everyone all
over the world could see "unhand-fall" are always timely contingent. So
people called this a cause. When they realized there is no "up" or
"down" in space, they developed the law of gravitation. This was the
new cause why the stone falls down. So both causes were right in the
sense that both called a timely contingent phenomen "cause". So we try
to find out the inner logic by watching the timely contingent
phenomens. We never see the inner logic. We can always ask "why, why,
why?" like a little child. And this is not naive. Physicians are only
satisfied when they will only have few physical laws that are
symmetric.]

The basis of learning is always that sitaution-action-benefit are
happening timely kontingent. What I tried to point out is that this is
not enough. There must be a design that keeps up an ideal for some
time. Of course the ideal is also based on timely contingent learning.
It adds the ability to practice one certain action for a while. This is
what I call gradual learning.

0
Reply Lars 7/30/2006 9:50:47 AM

"Lars" <LarsFiedler@gmx.de> wrote:
> Hello Curt,

I'm about 50 days late replying to this (I've got a lot of messages I've
fallen behind in replying to....)

> you are absolutely right - using a new term ("goal") has no benefit. It
> is just another description of a problem. And you ask the right
> questions, that I also ask myself:
>
> > It seems to me you simply favor the idea of a goal directed learning
> > machine which is constantly trying to reduce the errors between it's
> > current actions and it's goal.  But how does such a machine pick the
> > goals? How does it create sub-goals from the prime goals?  How does it
> > know when a goal should be changed?  Of all the goals it might have,
> > which would it be trying to reach at any moment in time?
>
> So I will try to answer some of these questions. I am not sure about my
> answers. And some answers will lead to further questions. But I think
> it will lead us the right way. Of course this will be boring to you,
> because you think goals can be realized as "reinforcement learning". So
> I will explain afterwards what I am missing in your examples of
> reinforcement learning.
>
> I prefer the term "ideal" because it implies that there is a process of
> getting nearer to the ideal. Maybe an ideal is a special kind of goal.
>
> 1.) How could an ideal be realized?
> -----------------------------------------------------------
> There are two physiological facts that inspired me:
> - When a human being waits to do a simple action, e.g. pressing a
> button when a certain signal appears, there are neurons in the
> prefrontal cortex that fire steadily. It seems the human brain has
> tension. Maybe this is a kind of goal or willing or whatever you call
> it.
> - There are steadily firing neurons in the brainstem, that seem to
> represent the state of the body. If they do not fire anymore (because
> of a damage) the human being looses consciousness or is not awake
> anymore. It seems the brainstem is like a constant fire that keeps our
> brain in action. And maybe it leads us to actions that keep our body
> alive and in homeostasis. (Antonio R. Damasio described this)

Or maybe, it's simply the "power source" that drives our actions - acting
in a way that's not much more interesting than the power supplies for our
computers.

> This leads me to the following ideas:
> - An ideal must be kept up for a while. So maybe an ideal can be
> realized as constantly firing neurons.

Yes, I think that's reasonable and also highly likely to explain some of
our goals.

> - An ideal must depend on the needs of the body. So maybe the needs of
> the body is set by the constantly firing neurons that represent the
> body. If the state of the body changes the ideal changes.
> - A constantly firing body is a very different approach than the
> input-process-output-Model.

No, not really.  Many sensory neurons are constantly firing.  This means
there's a constant flow of data into the system in the input-process-output
model.  Most inputs used in these models act as though they are constantly
firing.

> A system with a constantly firing body will
> act - not only react.

My designs have always included an output system that can act on its own.
It's clear we are able to do this.  We react to our environment by changing
our actions.  There are many ways to implement this.  However, the way I
think that has the most value, is to simply create a system where it's
actions, are feed back as additional sensory inputs.  In other words, the
system is able to sense, and react, to it's own actions.

For example, for a human to produce a walking motion, it must be able to
generate this repetitive pattern of leg and arm motions.  It needs some
sort of central pattern generator.

If the system was only reacting to it external environment, think about how
hard it would be to learn to produce a pattern like this.  We would have to
learn to take the first step, based on our current environment.  But by
doing so, we have moved, and the environment has changed.  So, now we have
to learn to take the second step, in this new environment.  Once we learn
that, we can't "reuse" what we learned about the first step, because now we
are two steps away from where we started - yet another very different
environment.  We would in effect, have to learn each step, in each new
location.  We couldn't walk down a street we had never seen before, because
we hadn't learned that taking a step was a good thing to do in that
environment.

To produce a generic pattern of outputs, like that which is required for
walking we much learn to react to our own actions.  If we can sense our own
actions, then we can learn to react to our own actions. If we sensed that
we were stepping forward with our right leg, when we sense it has reached a
limit, we can then react to that by stepping forward with our left leg.
When we sense that has reached it's limit, we can react to that by moving
our right leg forward.  A small set of reactions to our own actions, can
then create cyclic behavior patterns (like walking), that our independent
of our external environment.  It's an output pattern that can be triggered
in any environment - including walking down a street we have never seen
before.

Our reactions to external stimulus is then used in parallel with all our
internal reactions to override the behaviors when needed.  Reactions to our
sense of balance is used to fine-tune the walking patterns.  Reactions to
the objects around us make us stop walking, or take the actions needed to
turn to the right or left, or speed up, or slow down.

This is how you make a reaction machine act on it's own.  It's doing it by
reacting, to it's own actions.

I also suspect this is how the brain is structured.  If the neocortex is
the same basic structure for the whole brain, what makes the motor cortex
different from the sensory cortex?  Why is it that the motor cortex seems
to be performing "motor functions"?  I think it's because both halves are
in fact sensory reaction systems.  One half is reacting to external sensory
signals, and the other half (the motor cortex) is wired to react to the
brain's own actions.

> - I think human brains are not only designed to keep the body alive,
> which means to get food and so on. I think there must be a design that
> gives us pleasure that has only an intellectual reason not a bodily
> reason.

Why?  What is the evolutionary pressure that exists to justify the creation
of "happiness" in the human brain?  And more to the point, what is
happiness.

I can answer these questions in the framework of an reinforcement learning
machine easily.  I don't know how to answer them in the framework of a goal
directed machine.

> Such a design makes us happy if we predict something.

The value of prediction is obvious for survival.  But what does "happy"
mean? You seem to have defined it as a correct prediction with the above
comment.  But where does that get us?  I predict that if I reach over and
move by computer mouse, the cursor on the screen will move.  I predict that
if I press these keys on the keyboard, that letters will show up on the
screen.  We make millions of such little predictions for everything we do.
Yet, these things don't seem to make me happy.  So it doesn't seem valid to
me to try and link the concept of happiness with a correct prediction.

> With
> "predict" I mean that the human being has an expectation (similar
> realized as an ideal) that comes true. E.g. a child presses a
> light switch and is happy that the light goes on - what he expected. But
> let us keep things as simple as possible und let us see how far we come
> with "only" food.
> - "Subgoals": I do not know yet. Maybe somehow ideals can be
> agglomerated to one ideal and the one ideal can be divided into his
> elements as some chars can be agglomerated to a word and the word can
> be divided into chars. (Ok, this idea is confusing and not on the
> design layer.)

Or, connected with that, what are the prime goals and how are they
implemented?

As an educated adult, I can talk about how I'm hungry and my goal is to
find food to eat for dinner tonight.  And I can analyze my thoughts that
might end up driving my actions that lead to me getting food (this is
making me hungry :)).

But, what about a new born baby.  They seem to have food goals, but yet
clearly, they have not let learned to talk, and think about what they are
doing like an adult can.  They don't have the skills to put together a plan
for getting food.  Their skills are limited to what was hard wired into
them at birth - which for example includes the ability to swallow to get
food from the mouth to the stomach - or cry as a way of getting mother to
provide some food for us.  But a baby quickly learns new skills he was not
born with, like the ability to grab a tit and put it in it's mouth.

How is this skill learned in the context of everything being goal directed?
How is the goal of getting-tit-in-mouth created?  There is no indication we
were born with that as hard wired goal.  Babies don't seem to know what a
tit is at birth.  They know how to suck and swallow, but they don't seem to
know what a tit is.  Only after exposure to a tit (or a bottle) does the
baby seem to learn that these things are "good", and only after exposure
does the baby form these tin-in-mouth goals.

These are easy to explain in terms of reinforcement, but I don't know to
explain them in a strict goal directed view.  In terms of reinforcement,
the value of the tit is learned by reinforcement.  Good stuff happens when
sucking on a tit, so then tit sucking (as apposed to sucking in general -
such a toe sucking) is reinforced as a good behavior.  And the tit itself
becomes a predictor (a secondary reinforcer) of good things to come.
Acting as a secondary reinforcer, that sensory signal helps to shape our
grab-tit-and-suck behaviors.

> - Economy: There is a general problem in neural networks. How can we
> achieve that not all neurons fire at once and that at least one neuron
> fires. I call this the "problem of economy". Maybe the solution is
> something like this:
> There is an special area that usually does not fire emediatly to other
> areas. And there is a global parameter that rises during a second. This
> parameter supports the neurons until one representation is strong
> enough to fire into another area and e.g. cause an action. This
> mechanism could be something human as we say sometimes: "Just a second,
> it comes, I will remember it!". I have not heard of such a parameter in
> human brains. But this problem cannot be solved functional. Functional
> would mean that a set of neurons must inhibit all other neurons. This
> design would need to many connections.

There is a solution to this which I've used in many of my earlier network
designs.  It works by adding global activity regulation to the system. The
system is a learning systems already, meaning the weights are constantly
being adjusted to regulate which nodes fire, and when.  But, if the neurons
don't have any understanding of when other neurons are firing, how can they
learn to take turns and not all speak at once?  The answer is that the
learning rules must have some knowledge of global activity and must work to
prevent everyone from talking at the same time.  The one simple way to
implement this is to track global network activity (how many neurons fired
recently), and bias the learning system, to push this activity level
towards some central norm.  When the network becomes too active, all the
learning is biased in the direction to reduce the odds of nodes firing, and
when the network becomes to quite, all learning is biased in the direction
to make the network more active.

This type of global activity bias, when added on top of any other learning,
solves your economy problem. It would be easy to believe the brain used a
similar system that made learning a partial function of total neuron
activity.  Since it takes energy to make a neuron fire, which flows to the
neurons though a shared blood source, one way to bias the learning rules
would be for them to simply sense the chemical levels used to

My latest networks solved it in a much simpler way.  I switched to a pulse
sorting paradigm instead of a node firing paradigm.  By doing this, network
activity is held constant by design.

> - There is another problem (that you did not mention): I call it the
> "problem of sharpness". I think we agree that in the human brain a set
> of neurons represents a certain perception. The neurons in the brain
> always have an actual state, that were set from previous perceptions.

Yes, such as the leaky integrate and fire model always has some internal
activation level which is constantly changing and which changes as a result
of other neuron activations.

> And maybe at some points different kinds of perceptions (seeing,
> hearing) collide.

Collide?  They fuse to form new perceptions but I don't understand what you
mean by collide.

> But we always have one thought - which is sharp but
> not exactly sharp. - I have no general solution to this problem yet.

I don't understand what the problem is.  I don't have one "thought" in my
head.  There are many things happening, such as I am producing thoughts of
the words as I type them.  Sometimes that changes to spelling out a word.
At the same time, I hear the keys clicking, I feel the touch of my fingers
on the keyboard.  I hear the fan in my computer.  I see things changing on
the screen.  I hear noise from the family in the rest of the house.  All
these things are "thoughts" which happen in parallel in my brain.  What's
so "sharp" about all this buzzing going on in my brain?

> Why all this difficult stuff, when there is the easier solution of
> reinforcement learning?

What difficult stuff?  What's easier?

> 2.) Why is an ideal necessary?
> Curt, if I misunderstood you, please tell me where to find a
> description of your system at the design layer.

Only in my posts here.  And since I post a lot of stuff that is not about
my ideas about how an intelligent system could be structured, it would be
hard for you to find related post.  It's too hard for me to even find them.
:)

> I have not read all the
> "DOHs" in this thread :-). As I understood it you think about a system
> that has the input-processing-output-model - maybe with drawback loops
> but no constantly firing neurons as I described above.

Yeah, actually, I do tend to include constantly firing neurons.

I've been looking at pulse sorting networks that work in software much like
a decision tree.  Except it's a network instead of a tree.  This system has
many interesting properties, but it's still lacking some powers so there's
work to be done.  But yet, it can give you some insight into how I think
things need to work.

With this design, I'm basically using async pulse signals instead of some
more traditional synchronous network where all nodes calculate a new output
value for each clock cycle.  And in my typical implementations, I force the
network to process only one pulse at a time.  So there is never more than
one pulse in the network at a time.  The nodes in the network I've played
with have two outputs, and for each pulse they receive, they must make a
sorting decision, and decide which output to send the pulse down.  Each
pulse enters the net on some input path, and gets sorted though some path,
and reaches some output.

So, all the intelligence is in the decisions each node makes about how to
sort each pulse it receives.  This is a simple reaction system where the
behavior of the nodes is trained by reinforcement learning.

Now, as I talked about above for a reaction to "act" and not just "react"
as you said above, something more is needed.  But a feedback loop so that
all outputs of the network, are feed back as inputs, allow this to happen.
The network can then learn to react to it's own actions and in doing so,
produce any type of complex output patterns.

However, in this pulse sorting net, if I feed a pulse back into the net for
every one one that came out, the pulse would be stuck in the net in an
infinite loop.  To solve that, the output of the network is used to control
pulse generators, and the output of the pulse generators is what gets feed
back into the network instead of the control pulses.

There are different types of pulse generators I've played with, but one
example is a node that constantly fires at a fixed rate.  It's got two
input control paths where it receives pulses from the network.  One path
makes the pulse generate reduce it's firing rate, and the other, makes it
increase it's firing rate.  This node then has an internal state which is
the firing rate (pulses per second) which it maintains, and control pulses
received from the network makes it increase or decrease this internal rate.

These pulse generators are the real outputs of the system, but their
behavior is regulated, by the pulse sorting network.  And every pulse that
is created by the pulse generator gets duplicated with one pulse being feed
back into the pulse sorting network to allow it to learn to react to what
the system has been doing.

> I ask you similar questions that you have already answered at the
> psychological layer but not at the design layer:
> - How does such a machine pick the estimation rules? - One rule could
> have the aim to reproduce the input as an output like a parrot.

No, it does nothing even close to that.  That is not a rule or goal at all.

> But
> there must be other estimation rules for an intelligent system.
> - How does it know when the estimation rule should be changed?
> - Of all the estimation rules it might have, which would it be trying
> to reach at any moment in time?
> - How does it create sub-estimation-rules from the prime estimation
> rules?

It intent is for it to work like this...

The machine is a reaction system.  By it's design, it is forced to make a
reaction to every input pulse.  Every sensory input pulse must be sent
somewhere by the network.  The only issue is whether the current set of
reaction rules create useful behavior for the system - you can generally
assume the answer is probably now at the start.  The value of all the
current behaviors, is evaluated with the help of reinforcement learning.

The only behavior this system has is pulse sorting.  Each node is an
independent behavior machine, which is trained by reinforcement.  They only
information the nodes have to work with, is the pulse which are sent to
them, and the times when they are show up.  This type of machine is very
much a temporal processing machine because all the decisions about how to
sort each pulse are based on the temporal memory of each node.

The node design I've been looking at for a few years, had only one input
path (which was typically a merger of outputs from two other nodes).  And
the only thing it used to base it's sorting decision, was on the amount of
time that had lapsed since the last pulse showed up.  This does a lot, but
I've since decided this is not smart enough because it can't make sorting
decisions for example that are based on which previous node that pulse came
from - and I've decided that is something which it needs to be able to do.

However, ignoring that, I can explain the old design to give you a flavor
of what I'm thinking.  What that, each node maintained an internal time
value which is what it used to make all it's sorting decisions.  A pulse
that showed up quicker than that, would get sorted out the high frequency
output, and all pulses that showed up later than the time limit, would get
sorted out the low frequency side.  These nodes can be looked at as
frequency sorting nodes because with a constant low frequency input, the
pulses all go out one way, and with a constant high frequency input, they
all go out the other way.  But, with complex noisy signals, some pulses go
one way, and others go the other way.  The density of pulses going out each
side is just a function of the signal fed it, and of the internal setting
of the pulse sorting reference value.

By default, these nodes have a learning rule which causes the internal
sorting value to seek out a value that will cause, on the long term, an
equal number of pulses to be sorted out each side.  So by default, the node
will split the signal in half.

If for example, you feed one of these nodes from a light detector which is
configured to fire faster for brighter lights, you can then look at what
the outputs of this node would mean.  One output would mean "bright light",
and the other output would mean "dim light".  So the node is acting as
pulse classifier. It's sorting the pulse which man "bright light" out one
side, and the pulse which mean "dim light" out another side.  The default
behavior of the node, is to set split between bright and dim, right in the
middle so that half the pulses from the sensors are classified as bright
light pulses, and half are classified as dim light pulses.

If the network had two light sensor inputs, each of those signals would get
split in half, and then two of those signals would be joined back together.
So the resulting signal, after the joining, would have a logical meaning
something like, "bright light from sensor 1 OR dim light from sensor 2".

In the end, every output function from the network, is some very complex
combination of all the sensory signals after they have been split apart,
and combined back together again, in many different combinations.

The reinforcement learning problem, is to change the definition of those
mapping functions, to make the output reactions, more useful than they are
when the network first starts.  But at all times, the outputs are some
function, of the inputs.

Like all reinforcement learning machines, it must have a critic which is
fixed hardware for generating reward signals by monitoring various aspects
of the environment.  It generates rewards when "good things" happen, and
either generators a punishment signal, or generates less rewards when "bad
things" happen.

The network, like all reinforcement learning systems, only has one real
goal.  It's goal is to change it's behavior in ways that will increase the
number of rewards, per time, the machine is receiving from the critic.  The
details of how to implement this for this type of network, is what I've
been looking at for a few years now, and not making much progress, but the
basics are easy to understand.

The only behavior the network has is pulse sorting, so that's what is being
rewarded.  Each node tracks how much reward it's received, relative to what
it's been doing.  If it gets more rewards for sorting pulses out one side
than the other, it will adjust it's behavior so that in the future, it will
tend to sort more pulses out the "good" side.  This is easy for this type
of node to do since it can simply adjust it's internal sorting value a
small amount to make that happen.  Assuming the input signal is complex
(very noisy) this will cause slightly more pulses to go out one side than
the other.  So, over time, each node tracks rewards relative to it's
actions, and adjusts it's behavior to try and increase the total rewards.
If need be, it will end up sorting almost all pulses out one side vs the
other, so it will act more like a switch, than a signal classifier.

Each node has access to only a very limited amount of data, and each node
only has a very limited amount of power to control what the network as a
whole does.  But yet, working together, the idea is that many nodes working
together can produce very complex behaviors.

> Or to stay at the HORSE-example:
> Let us assume the system says "HOXEL" and is rewarded. There must be 2
> rules that lead the system nearer to "HORSE":
> 1. Rule: The system must say words that are similar to "HOXEL". So a
> rule must define what means "similar". Maybe this could be achieved by
> a neural network with a bit unsharp actions.
> 2. Rule: The system must have the estimation rule "parrot" at the
> moment it says "HOXEL".  This is an artificial rule, that has no
> counterpart in the human brain. It is built on top of the software of
> the neural network.
>
> I do not think artificial rules will bring us any further, because it
> restricts the system to learn something special.

Well, the point of this type of network design is to start of in a
maximally complex configuration, so that it's output behaviors look nearly
perfectly random.  They are not random however, they are very
deterministic.  However, the function is so complex that it will look
random to a human which will see no "purpose" in the complex behavior.  The
reward system is expected to reward it at times, just because it gets lucky
(mom stuck a tit in my mouth so I get a reward even though I did nothing).
But, it learns from this experience because the recent behaviors of all the
nodes, is biased to reflect what has recently happened in the sensory data.
So the nodes all slightly change their behavior to reflect that these
sensory conditions are one in which it got a reward.  Which brings up the
issue of secondary reinforcement.  This is something I've not figured out
how to correctly implement in this type of network.  But the goal is for
the network to also act as reward predictor.  So it needs to learn that a
given sensory condition, is more likely to produce rewards, because it's
seen more rewards in those sensory conditions.  It needs to learn that a
tit is a "good thing".  Then, in the future, when it does something that
happens to create the "tit" sensory condition (such as turning it's head to
the right as a reaction to sound of mom's voice on the right), that head
turn reaction gets rewarded.  This is how the system creates behaviors that
look like goal seeking behaviors - because the system has learned that the
sensory condition of "tit" is a good thing, which means, doing things that
reproduce that sensory condition, is a goal for it.

So the concept of "closeness" you related to happens a few ways.  First, a
good critic design will be one that can reward based on "closeness".  The
more the critic can do that, the easier it will be for the system to hone
in a better answer (aka hill climb towards maximal rewards).  So in your
HORSE example, it would be far better for the critic to reward based on how
many letters the system got right than waiting until it got them all right,
and then giving it a reward.  So, closings in this case is defined by the
critic to help Gide the learning machine in the right direction.

The other way closeness works is by the actions of the secondary
reinforcement.  Since all the nodes in this large network are acting to
measure the "value" of any sensory condition, it will also naturally create
a measure of closeness based on how many of the nodes are in the right
state.  The state of each node is a function of recent past sensory data,
so the state of the network as a whole is intended to represent the
system's best understanding of the current state of the entire environment.
If 10 nodes are reporting the environment is in a state of high expected
reward, that's not as good as when 50 nodes are reporting it.  This allows
the network to produce an estimate of how good of a state, the environment
is in.  The goal of the system is to produce whatever outputs work, to
manipulate the environment into the best possible state for it.  So this
natural system of using many parallel networks to dissect, and understand,
the environment, works naturally to make "closeness" predictions to help
guide learning.

So if the critic only rewards for producing "HORSE" correctly, it will take
some time for the network to produce that by random chance (but the default
behavior of the network is to act very complexly (very random like), so
given enough time, it will always happen).  It will have to happen many
times before the system starts to both learn the behavior, and to recognize
that sensory condition (sensing that we just produced the output HORSE) as
a "good thing".  But as that ability develops, outputs that look close to
"HORSE" will produce partial rewards, and bias the behavior of the system
to produce more outputs close to "HORSE".  This will help to quickly reduce
the amount of time between behavior of producing "HORSE".  Which leads to
more rewards, which leads to better reward predictions, which leads to
better behavior, and next thing you know, the machine is producing nothing
but the word HORSE.

The goal here was never a rule to parrot.  The goal was always to make the
critic produce as much rewards as possible.  If the critic only rewards for
the behavior of "HORSE" then that's what the system would learn.

But a more interesting critic will not reward for a fixed output.  Instead,
it will reward for some important result - like a robot getting it's
batteries charged because it's managed to position its solar cells in a
bright light.  This type of critic will cause the robot to learn light
seeking behaviors, or dark avoidance behaviors.  If it's smart enough, it
might even learn behaviors like turning on a light switch to get more light
in the room.  If it's really smart, it might learn to speak English and ask
us to let it outside so it can get direct sunlight. :)

Now on the issue of goals.  Humans, very much have goals.  And I suspect
many short term goals are represented by neurons constantly firing as you
made reference to at the beginning.  A reaction machine that has no short
term goals can't produce a long string of behaviors to reach that goal -
like walking to the kitchen for the purpose of getting a drink of water.

The environment can trigger a lot of goal seeking behavior but you can't
explain all human behavior in those terms.  And the internal environment of
the human body can actually act as environment to the reinforcement
learning brain.  So, an empty stomach for example act as the goal for the
food gathering behaviors. And it can keep us focused on that goal, instead
of allowing us to be distracted by other things.  In a robot, with my type
of network used to drive it, you can for example give it extra inputs to
allow it to understand the state of the robot - such as a battery level
input.  That input can act to motivate the robot to return to it's charging
station.  This creates the "I want to go back to my charger" goal effect.

Humans however have language behaviors.  We have the ability to speak
silently to ourselves.  When we do this, we seem to activate internal
states that directs our short term goal seeking behaviors (like going to
the kitchen to get a glass of water).  Or, seeing that the trash can in the
office doesn't have a trash bag in it, so we go to the kitchen to get a
trash bag.  In a very simplistic reaction machine, this is hard to do since
the minute you leave the office, the trash can is no longer part of the
environment and it's no longer there to keep triggering the "get trash bag"
behaviors.  Something we see on the way to the kitchen might trigger us to
go to the office and we ended up aborting the trash can behaviors.  There
needs to be internal state that is maintained in the brain to complete that
task.  We need to "remember" what we are trying to do.

I'm not sure how this would be implemented in my type of network, though I
have various thoughts.  I think it would develop by the machine first
learning to perform a simple behavior which is triggered directly by the
environment.  Like picking up a glass of water and drinking - all triggered
by an internal thirst signal and the sight of the glass of water.  But what
happens when we get thirsty, and there is no glass of water?  I believe
what needs to happen, is that this tries to trigger the "pick up glass and
drink behavior", but without a real glass around to direct the actions of
the arm the brain is simply triggered to perform "find glass behaviors"
instead.  These things it seems, needs to create a persistent bias in the
brain state of "find glass" in the motor cortex the same way a bias is
created when a output pattern sequence is selected that allows us to keep
walking.  I'm thinking it must happen with the help of feedback loops that
allow the system to lock into a given state which represents the "get
glass" goal.  The existence of that loop in the motor cortex being active
is what drives the strings of behaviors, all as direct reactions to the
environment and that internal state, to make us seek out a glass, fill it
with water, and then drink.

The above is a vague and doesn't answer exactly how this might work. But
the general idea I think is there.  It must learn simple behaviors first,
and then, leverage those to create more complex behaviors.  When trying to
perform a behavior that worked well in the past, but the needed components
are missing (like the glass), the attempt to do that alone, acts as the
environmental state that the system reacts to, which causes us to get the
things we need.

This "getting what is missing" starts off very simple first.  If we want to
drink from a bottle, and we don't see the bottle, then we don't know which
way to move our hands.  So we learn to turn our head and scan our eyes in
an attempt to find the bottle.  So the same things which trigger us to try
and drink, when combined with a "no bottle" condition, triggers the "search
behavior".  In time, our "get bottle" behavior, grows increasingly complex,
and longer lasting - we can think about it and end up walking to the
kitchen and getting the glass out of the cabinet - even being interrupted
by a phone call in the middle of the act.

So, it's all a matter of building complex behaviors, through reinforcement.
The machine starts out producing very random looking behaviors, but they
are never actually random, they are very deterministic reactions created by
the complex actions of a large number of very simple agents (neurons)
working together.  This means when good things happen, these
micro-behaviors can be independently reinforced, always moving the machine
to behaviors that produce better results.  Though the behaviors produced by
such systems look like what we call goal directed behaviors the only real
goal is to maximize rewards.  The internal reward prediction system then
creates a fairly continuous landscape to allow the system to slowly
hill-climb towards the higher grounds as it evolves it's set of reactions.
Each node in this multi-agent network is in effect solving it's own hill
climbing problem. Though some nodes will get stuck on a local maximum,
other nodes will continue to make progress.  And as they change, the
environment changes because the machine is behaving differently, which can
cause nodes which were once stuck on a local maximum, to be kicked off it -
allowing it to make progress towards better behaviors.

The key is that all network level macro behaviors, are created by many
different nodes (micro behaviors) working together.  But not all micro
behaviors are used at once.  The sensory environment defines the current
context, which in turn, maps to a different subset of current nodes.  So
only a subset of the nodes are used for making each decision.  This is seen
in my pulse sorting net by the simple fact that only the nodes a pulse
passes though, were part of the behavior of the network at that time.  A
network could have a million nodes, and only use on average 20 nodes to
sort each pulse. So when it's rewarded, only the nodes recently used, are
the ones being trained - the rest of the network is not effected.  The
parts of the networks used, and trained, is always a function of the
current sensory context - the networks idea of the current state of the
environment.  Unlike a more traditional neural network, where the entire
network is always used, to calculate the outputs, this type of network is
selected, and only the part of it is used to make each decision.

also, the number of different "reactions" such a network can produce, is
much higher than the number of nodes since they act together in different
combinations to produce each reactions.  A network with hundreds of nodes
can produce billions of different reactions to different environments.

It's all just a temporal reaction machine able to produce billions and
billions of different reactions to different environments, tuned by
reinforcement learning to produce increasingly better behaviors over time.
Though these machines produce what we call goal directed behaviors, the
only goal it really has, the prime goal, is to maximize total rewards over
time, and all the behavior that looks like sub-goals, are just reaction
sequences the machine has learned which leads it to higher rewards.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 9/12/2006 5:29:39 AM

"JGCASEY" <jgkjcasey@yahoo.com.au> wrote:
> Wolf Kirchmeir wrote:
> > Lars wrote:
> > [..]
> > > I do not think artificial rules will bring us any further, because it
> > > restricts the system to learn something special.
> >
> >
> > However, natural systems (animals, including us) in fact have "special
> > rules". They are the inbuilt behaviours. You seem to believe that
> > "learning" means acquiring behaviours the system has never exhibited
> > before. IOW, you appear to believe in some sort of tabula rasa.
>
> I think the tabula rasa approach is exactly what Curt imagines could
> take place in a RL machine with the right kind of "power".  That he
> could evolve these inbuilt primary behaviors just as evolution did.

Yes, but it's wrong to think of it as an blank slate.

My approach using a pulse sorting network actually starts out with a slate
in its most complex configuration - where every output is a complex
function of every sensory input.  This starts the system in a configuration
of maximal complexity which makes it produce behavior that seems to be
random.  But it it's not random in any sense, it's extremely deterministic
and producing the exact reactions the machine was programmed to produce.

Learning then is a process of the system simplifying its reactions by
slowly removing terms from the reaction equations.  It stars off using all
the sensory data (filtering nothing), and then learns over time, what to
ignore.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/
0
Reply curt 9/12/2006 5:50:40 AM
comp.ai.philosophy 6529 articles. 0 followers. Post

79 Replies
300 Views

Similar Articles

[PageSpeed] 58


  • Permalink
  • submit to reddit
  • Email
  • Follow


Reply:

Similar Artilces:

Reinforce learn this
For all those reinforcement learning advocates in this newsgroup, I will give some examples where it should be clear that reinforcement learning alone won't be enough. It is not possible for pure RL-AI to: 1...design airplane, bridge, skyscraper or software. 2...know what types of functioning some software could have and then: 2.1...convert it to a version that does exactly the same things but faster (if it is possible for that particular software). 2.2...discern between features and bugs, without any additional information. (it is not possible to do that _perfectly_) 3...reason and ...

reinforcement learning
Hi, I'm not sure if this is the right newsgroup. I read through some chapters of http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html But I don't understand many things in this book. So I ask if anyone knows sites about reinforcement learning. I also need example source code. Best regards Thorsten Thorsten Kiefer wrote: > Hi, > I'm not sure if this is the right newsgroup. > I read through some chapters of > http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html > > But I don't understand many things in this book. > So I ask if anyone know...

reinforcement learning
Dear all, Is anyone aware of a mailing list that exclusively deals with 'reinforcement learning'? Many thanks in advance. Regards, Chris [ comp.ai is moderated. To submit, just post and be patient, or if ] [ that fails mail your article to <comp-ai@moderators.isc.org>, and ] [ ask your news administrator to fix the problems with your system. ] ...

reinforcement learning
Hi, can anyone propose me some links about that topic ? I allready know http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html , but it's too complicated for me. Best regards Thorsten On Sun, 25 Feb 2007 00:29:08 +0100, Thorsten Kiefer <toki782@usenet.cnntp.org> wrote: >Hi, >can anyone propose me some links about that topic ? >I allready know http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html , >but it's too complicated for me. > www.google.com By the way, the book you quote is quite elementary.... See the following: "Pattern Recogniti...

Gradual Learning
The incremental or gradual learning of new ideas is not the same thing as learning by reinforcement. If you wrote a program that was so limited that it truly was only able to learn through simple reinforcement you would have to, by definition, exclude the potential for that program to ever be able to learn through the use of more complex interactions. Conversation is an example of complex IO that should be feasible for an AI program in the near future. If you literally restricted all learning so that it could only learn through simple reinforcement, then it would never be able to acquire th...

reinforcement learning
How can use reinforcement learning in tuning pid controllers? ...

reinforcement learning
How can use reinforcement learning methods for tuning controller parameters? ...

Reinforcement Learning #2
Baby touches hot stove. Learns not to do that again. Reinforcement learning. What more to it is there? Don Stockbauer <donstockbauer@hotmail.com> wrote: > Baby touches hot stove. Learns not to do that again. > > Reinforcement learning. What more to it is there? Not much more! :) -- Curt Welch http://CurtWelch.Com/ curt@kcwc.com http://NewsReader.Com/ On Aug 18, 11:14=A0pm, c...@kcwc.com (Curt Welch) wrote: > Don Stockbauer <donstockba...@hotmail.com> wrote: > > Baby touches...

Is all learning due to reinforcement?
>From the thread, Subject: The structure of a self-conscious mind On 12 Sep 2005 00:29:27 GMT, c...@kcwc.com (Curt Welch) wrote: > The trick to creating human level intelligence > in a machine is not to figure out how build a > reasoning machine, but instead, to figure out > how to build a reinforcement learning machine > that can learn to reason on its own without > having some intelligent programmer built the > reasoning algorithm into it. A "reasoning algorithm" would be useful. How might that work? Would it be trying to show the "reason" why ...

Reinforcement learning in robots.
(For those of you already familiar with the defn of RL, skip to the bottom of this post for the contraversial statements.) Reinforcement learning in its technical sense refers to a subfield of machine learning. This type of learning is very popular and is often used in robot learning architectures. In essence, learning in this view amounts to learning a mapping from perceived states to desired actions. Such a mapping is called a "Policy". The goal is to find a policy that maximizes the system's performance at the particular task assigned. For example, in a pole-balancing pr...

Reinforcement learning framework
Piqle is a Java framework for implementing Q-learning experiments, developped at the university of Lille(F). The multi-agent version is now available at : www.lifl.fr/~decomite/piqle Any return will be welcome. [ comp.ai is moderated ... your article may take a while to appear. ] ...

What limitations does reinforcement learning have?
What limitations does reinforcement learning have, and what advantages of neural networks such as backpropagation have over reinforcement learning neural networks (ie kohonen)? ...

ANN in reinforcment learning
Hi! I am doing a thesis about two soccer teams learning team strategies using RL. Until now i have been thinking about using a multilayer perceptron and training it with backprop. However, i read some of the older posts here and they said that this kind of ANN is not good for this kind of task. That an RBF network (i suppose that is Radial Basis Function network) would be better. The thing is that so much depend on this NN in my thesis so i want it stable and good. The input to the network would be all the players, of both teams, positions, velocity, team "color" and some extra varia...

Reinforcement learning machines
JC: >> And this hard wired behavior selection system results >> in the human level intelligence you so crave to duplicate >> in a machine. TD-gammon's inbuilt analysis for rewards >> is a win state. It will always be limited as to what it >> can learn by that reward system. This is one of the >> things I see missing in your idea of a generic learning >> system, an inbuilt reward system as good as the one >> in the human brain. CW: > Well, I think you are probably confusing the critic - > which produces the lowest level rewards, and th...

Reinforcement Learning Query
Apologies if this is the incorrect thread to post this in and if this question has been answered before, I searched but couldn't find the answer. I'm new to the whole area of RL and after reading much literature still cannot seem to find the answer to my problem. My problem is how do I calculate the state space for a specific problem. Take for example the relatively simple case of tic-tac-toe. I wish to solve this using standard techniques such as value iteration as a learning exercise. However to do so I need to determine the state space, which is where I am stuck. I would really ap...

Multiagent reinforcement learning
Hi, I've been reading about multiagent reinforcement learning and discovered that there are various extensions of the Q function, e.g., Minimax-Q, Nash-Q, Friend-or-Foe, and one commonality among them is they are designed for isolated, concurrent learners. So, I was wondering, how are these learners/agents setup? Do they coordinate at all? What and how does an agent know about what others have learnt? Thanks, Michael [ comp.ai is moderated ... your article may take a while to appear. ] ...

Reinforcement Learning with Backpropagation
Hello Everybody, I'm trying to do something that I think should be simple, but I am not able to find the right material/example to teach me. What I am to do is build a small ALife project: train an agent powered by a neural network using backpropagation with *reinforcement learning* to mate/eat food/avoid poison/etc. However, every time I look at RL material, it is too mathematical/abstract to be of any real use. I am simply unsure how to create a value function that I will be able to use for the connection updating. Can anyone give me some pointers? Will greatly appreciate any and all h...

Reinforcement Learning Convergence
Hi, Does anyone know which Reinforcement Learning algorithms have been proven to converge for control problems with a value function and function approximator? I have a proof for one case, and would like to know what others there are. Also, is this the best newsgroup for this question? Thanks for your suggestions. Mike Fairbank. ...

Reinforcement Learning in Robotics
This might be related here as well: http://groups.google.com/group/comp.ai.games/browse_thread/thread/8198fde98af821af/c094a0997142b8cb?lnk=st&q=kartoun&rnum=2#c094a0997142b8cb Thanks, -U. ...

Do you like Reinforcement Learning ?
http://toki.burn3r.de/qzeronn.html Best Regards Thorsten Kiefer Am Freitag, 16. Mai 2014 05:30:53 UTC+2 schrieb Thorsten Kiefer: > http://toki.burn3r.de/qzeronn.html > > > > Best Regards > > Thorsten Kiefer New link : http://toki.burn3r.de/q-learning/q-learning-applet.html Am Samstag, 14. Juni 2014 22:47:04 UTC+2 schrieb Thorsten Kiefer: > Am Freitag, 16. Mai 2014 05:30:53 UTC+2 schrieb Thorsten Kiefer: > > > http://toki.burn3r.de/qzeronn.html > > > > > > > > > > > > Best Regards >...

Predator and prey with reinforcement learning
In a previous thread, Is a general purpose mechanism possible? on Thurs, Apr 27 2006 feedbackdroids wrote: > The classic example from early evolution is that the > cyanobacteria ate CO2 and pumped O2 into the atmosphere, > and this was originally a toxic substance to the anerobic > bacteria, so they all had to evolve to adapt. > > Later, during the cambrian explosion there were 1000s > of different athropods evolving, each being food for > others, which established the sort of predator-prey > antagonism we're familiar with today. I may be wrong, > but I thi...

Reinforcement Learning Algorithms Complexity
What is the complexity of RL learning algorithms (for example: Q-learning, Q(lambda)-learning) etc.? Thanks, Uri. http://www.compactech.com/kartoun/ "kartoun" <kartoun@gmail.com> wrote in news:1162473685.806493.194310 @b28g2000cwb.googlegroups.com: > What is the complexity of RL learning algorithms (for example: > Q-learning, Q(lambda)-learning) etc.? > > Thanks, > > Uri. > > http://www.compactech.com/kartoun/ > > Since most RL algorithms are based on a set of linear equations for updating cost/gain state variables, the complexity is...

reinforcement learning TD example
hI, I wrote an unbeatable tic tac toe player. I made the source public. if anyone is interrested, here is the applet : http://www.tokis-edv-service.de/index.php/beispiele Best Regards Thorsten ...

NN & reinforcement learning ...
Does anybody used reinforcement learning for high reliability in telecommunication networks? How this enterprise is impemented in real nets? Any ideas? Thanks in advance Constandinos ------------------------------------------------- URL-> http://agent.csd.auth.gr/~cmavrom -------------------------------------------------- -- ------------------------------------------------- Constandinos Mavromoustakis Five Year Diploma in Electronic & Computer Eng. Technical University of Crete - Greece, MSc in Telecommunications Department of Electronic & Electrical Engineering University Co...