Next: I. The Problem Up: Contents Previous: Series Forward Contents

Summary of Notation

discrete time step

final time step of an episode

state at

action at

reward at , dependent, like , on and

return (cumulative discounted reward) following

-step return (Section 7.1)

-return (Section 7.2)

policy, decision-making rule

action taken in state under deterministic policy

probability of taking action in state under stochastic policy

set of all nonterminal states

set of all states, including the terminal state

set of actions possible in state

probability of transition from state to state under action

expected immediate reward on transition from to under action

value of state under policy (expected return)

value of state under the optimal policy

, estimates of or

value of taking action in state under policy

value of taking action in state under the optimal policy

, estimates of or

vector of parameters underlying or

vector of features representing state

temporal-difference error at

eligibility trace for state at

eligibility trace for a state-action pair

discount-rate parameter

probability of random action in $\varepsilon$ -greedy policy

step-size parameters

decay-rate parameter for eligibility traces

Next: I. The Problem Up: Contents Previous: Series Forward Contents

Mark Lee 2005-01-04

	discrete time step
	final time step of an episode
	state at
	action at
	reward at , dependent, like , on and
	return (cumulative discounted reward) following
	-step return (Section 7.1)
	-return (Section 7.2)
	policy, decision-making rule
	action taken in state under deterministic policy
	probability of taking action in state under stochastic policy
	set of all nonterminal states
	set of all states, including the terminal state
	set of actions possible in state

	probability of transition from state to state under action
	expected immediate reward on transition from to under action
	value of state under policy (expected return)
	value of state under the optimal policy
,	estimates of or
	value of taking action in state under policy
	value of taking action in state under the optimal policy
,	estimates of or
	vector of parameters underlying or
	vector of features representing state

	temporal-difference error at
	eligibility trace for state at
	eligibility trace for a state-action pair

	discount-rate parameter
	probability of random action in $\varepsilon$ -greedy policy
	step-size parameters
	decay-rate parameter for eligibility traces