NIPS Tutorial December 2, 1996


REINFORCEMENT LEARNING


by Richard S. Sutton

Senior Research Scientist Department of Computer Science University of
Massachusetts Amherst, MA  01003

rich@cs.umass.edu http://www.cs.umass.edu/~rich

ABSTRACT

Reinforcement learning is learning about, from, and while interacting
with a environment in order to achieve a goal.  In other words, it is a
relatively direct model of the learning that people and animals do in
their normal lives. In the last two decades, this age-old problem has
come to be much better understood by integrating ideas from psychology,
optimal control, artificial neural networks, and artificial
intelligence.  New methods and combinations of methods have enabled
much better solutions to large-scale applications than had been
possible by all other means.  This tutorial will provide a top-down
introduction to the field, covering Markov decision processes and
approximate value functions as the formulation of the problem, and
dynamic programming, temporal-difference learning, and Monte Carlo
methods as the principal solution methods.  The role of neural
networks, evolutionary methods, and planning will also be covered.  The
emphasis will be on understanding the capabilities and appropriate role
of each of class of methods within in an integrated system for learning
and decision making.


Suggested further readings

General Reinforcement Learning

Sutton, R.S., Barto, A.G., An Introduction to Reinforcement Learning. 
A nearly-completed textbook treatment.

Barto, A.G., Sutton, R.S., Watkins, C.J.C.H. (1990) ÒLearning and
Sequential Decision MakingÓ. In Learning and Computational
Neuroscience:  Foundations of Adaptive Networks, M. Gabriel and J.W.
Moore, Eds., pp. 539-602, MIT Press.

Kaelbling, L.P. (1996) Special triple issue on Reinforcement Learning
of the journal Machine Learning, Vol 22, Nos. 1/2/3.

Sutton, R.S. (1992) Special double issue on Reinforcement Learning of
the journal Machine Learning, Vol 8, Nos. 3/4.

Animal Learning Theory and Reinforcement Learning

Sutton, R.S., Barto, A.G. (1990) ÒTime-derivative models of Pavlovian
reinforcement,Ó in Learning and Computational Neuroscience: 
Foundations of Adaptive Networks, M. Gabriel and J. Moore, Eds., pp.
497--537.  MIT Press.

Neuroscience and Reinforcement Learning

Houk, JC, Adams, JL & Barto, AG (1995). ÒA model of how the basal
ganglia generate and use neural signals that predict reinforcement.Ó 
In JC Houk, JL Davis & DG Beiser, editors, Models of Information
Processing in the Basal Ganglia, pp. 249-270. Cambridge, MA: MIT Press.

Montague, PR, Dayan, P, Person, C & Sejnowski, TJ (1995). ÒBee foraging
in uncertain environments using predictive Hebbian learning.Ó  Nature
377, 725-728.

Montague, PR, Dayan, P & Sejnowski, TK (1996).  ÒA framework for
mesencephalic dopamine systems based on predictive Hebbian learning.Ó 
Journal of Neuroscience 16, 1936-1947.

See also my web page, starting from http://www.cs.umass.edu/~rich.