by Richard S. Sutton and Andrew G. Barto

Below are links to a variety of software related to examples and exercises in the book.

- Re-implementations in Python by Shangtong Zhang
- Re-implementations in julialang by Jun Tian
- Original code for the first edition
- Re-implementation
of
first edition code in Matlab by John Weatherwax

- Chapter 1: Introduction
- Chapter 2: Multi-armed Bandits
- Chapter 3: Finite Markov Decision Processes
- Chapter 4: Dynamic Programming
- Chapter 5: Monte Carlo Methods
- Chapter 6: Temporal-Difference Learning
- Chapter 7: n-step Bootstrapping

- Chapter 8: Planning and Learning with Tabular Methods
- Chapter 9: On-policy Prediction with Approximation
- Chapter 10: On-policy Control with Approximation
- Linear Semi-gradient Sarsa(lambda) on the Mountain-Car, Figure 10.1
- n-step Sarsa on Mountain Car, Figures 10.2-4 (Lisp) with tile coding

- R-learning on Access-Control Queuing Task, Example 10.2, Figure 10.5 (Lisp), (C version)
- Chapter 11: Off-policy Methods with Approximation
- Baird Counterexample Results, Figures 11.2, 11.5, and 11.6 (Lisp)
- Chapter 12: Eligibility Traces
- Offline lambda-return results, Figure 12.3 (Lisp)
- TD(lambda) and true online TD(lambda) results, Figures 12.6 and 12.8 (Lisp)
- Sarsa(lambda) on Mountain Car (Lisp) (Python: MC and Sarsa) with tile coding
- Chapter 13: Policy Gradient Methods (this Python code is
available at github)