 |
Reinforcement Learning and Artificial
Intelligence (RLAI)
The reward hypothesis
|
The ambition of this web
page is to state, refine, clarify and, most of all, promote discussion of, the following scientific hypothesis:
That all of what we
mean by goals and purposes can be well thought of as maximization
of the expected value of the cumulative sum of a received scalar signal (reward).
Is this true? False? A definition? Unfalsifiable? You are encouraged to comment on
the hypothesis, even in minimal ways. For example, you might submit an
extension "Yes" to indicate that you believe the hypothesis, or
similarly "No" or "Not sure". These minimal responses will be collected and
tallied at some point, and you may want to change yours later, so
please include your name in some way.
This is my favorite "null hypothesis", so much so that I sometimes call it simply the
null hypothesis. It feels essential to take a position on this
very basic issue before one can talk clearly and sensibly about so much
else.
Michael Littman calls this the reinforcement learning hypothesis.
That name seems appropriate because it is a distinctive feature of
reinforcement learning that it takes this hypothesis
seriously. Markov decision processes involve rewards, but only
with the onset of reinforcement learning has reward maximization
been put forth seriously as a reasonable model of a complete
intelligent agent analogous to a human being.
-Rich
Yes! -Rich
