 
 
 
 
 
 
 
  
 -Step TD Prediction
 Up: III. A Unified View
 Previous: III. A Unified View
     Contents
-Step TD Prediction
 Up: III. A Unified View
 Previous: III. A Unified View
     Contents 
Eligibility traces are one of the basic mechanisms of reinforcement learning.  For
example, in the popular TD( ) algorithm, the
) algorithm, the  refers to the use of an
eligibility trace.  Almost any temporal-difference (TD) method, such as Q-learning
or Sarsa, can be combined with eligibility traces to obtain a more general
method that may learn more efficiently.
 refers to the use of an
eligibility trace.  Almost any temporal-difference (TD) method, such as Q-learning
or Sarsa, can be combined with eligibility traces to obtain a more general
method that may learn more efficiently.  
There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and one-step TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way.
The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment.
For reasons that will become apparent shortly, the more theoretical view of
eligibility traces is called the forward view, and the more mechanistic
view is called the backward view.  The forward view is most useful for
understanding what is computed by methods using eligibility traces,
whereas the backward view is more appropriate for developing intuition
about the algorithms themselves.  In this chapter we present both
views and then establish the senses in which they are equivalent, that is, in
which they describe the same algorithms from two points of view. As
usual, we first consider the prediction problem and then the control problem. 
That is, we first consider how eligibility traces are used to help in predicting
returns as a function of state for a fixed policy (i.e., in estimating  ).  Only
after exploring the two views of eligibility traces within this prediction setting do
we extend the ideas to action values and control methods.
).  Only
after exploring the two views of eligibility traces within this prediction setting do
we extend the ideas to action values and control methods. 
 -Step TD Prediction
-Step TD Prediction
 )
)
 )
)
 )
)
 )
)
 
 
 
 
 
 
 
 
  
 -Step TD Prediction
 Up: III. A Unified View
 Previous: III. A Unified View
     Contents 
Mark Lee
2005-01-04
-Step TD Prediction
 Up: III. A Unified View
 Previous: III. A Unified View
     Contents 
Mark Lee
2005-01-04