by Richard S. Sutton and Andrew G. Barto

Below are links to a variety of software related to examples and exercises in the book.

- Re-implementations in Python by Shangtong Zhang
- Re-implementations in julialang by Jun Tian
- Original code for the first edition
- Re-implementation
of first edition code in Matlab by John Weatherwax

- Chapter 1: Introduction
- Chapter 2: Multi-armed Bandits
- Chapter 3: Finite Markov Decision Processes
- Chapter 4: Dynamic Programming
- Chapter 5: Monte Carlo Methods
- Chapter 6: Temporal-Difference Learning
- Chapter 7: n-step Bootstrapping

- Chapter 8: Planning and Learning with Tabular Methods
- Chapter 9: On-policy Prediction with Approximation
- Chapter 10: On-policy Control with Approximation
- Chapter 11: Off-policy Methods with Approximation

- Chapter 12: Eligibility Traces
- Chapter 13: Policy Gradient Methods