  | 
 Reinforcement Learning and Computer
Go (RLGO) 
  | 
 
 
  The
Question
Hypothesis 
  | 
 
 
 
The ambition of this web
page is to propose and discuss the following hypothesis:
All useful features of a state in Go can
be interpreted as answers to questions about the current and future
value of
observations.
Motivation
If the question hypothesis is correct, then we only need one algorithm
to construct all of our features: an algorithm to answer a question. If
the question hypothesis is incorrect, we will need to use a variety of
different algorithms to construct our features.
The question hypothesis could be
considered to be a specialised version of the Empirical Knowledge Hypothesis
Assume that we make some observations Z as the game proceeds.
Observations can be made about both the state (for example an eye) and
state transitions (for example connecting or capturing a group). Any
concrete fact can be used, as long as it can be deterministically
computed from the state of the board.
The current value of an observation may be a useful feature. But more
generally, we may be interested in asking questions about an
observation at some future time (or times). These correspond naturally
to predictive features that
we may wish to use.
Definition
A question q is defined as:
q(z, c, πb, πw, w)
where 
z is an observation about a state or state transition
c is the player to play first
πb is the policy followed by black
πw is the policy followed by white
w(k) is some non-negative weighting function defined over each
time-step from now (k=0) to the future (up to k=∞)
We define the outcome ω of
an observation z to be the weighted average value of z from now
(time t) until the end of the game.
               
ωt(z,w) = ∑(k=0
to ∞)w(k).zt+k
              
------------------------
              
∑(k=0
to ∞)w(k)
The answer to a question q at
time t is defined as the expected value
of the
outcome for observation z where c is to play first, black follows
policy πb and white follows policy πw.
at(q) = E(πb, πw, c) { ωt(z)
}
Example questions
Each of the following features can be expressed as a question of the
above form, by making appropriate choices for z, c, πb,
πw, and w.
 - Does my group have an eye already?
  
 - Can I make an eye if that is my only goal? 
  
 - Will I make an eye at x1 and an eye at x2 if I try to capture the
opponent at x3?
 
 - Will this point become territory for black at the end of the game?
 
 - Is my group in the corner likely to come under attack soon?
 
 - Can this stone be captured next move if I do nothing?
 
 - Can I capture an opponent group without my own group getting
captured?
 
 - Can I capture an opponent group whilst responding to any of the
moves that I believe threaten to capture my own group? 
  
It is worth noting that the complexity of a question is often
determined by the complexity of the policies, rather than the
complexity of the observation! Note that the last question includes dependencies
in the policy. 
 
How does segmentation fit into this framework? For example, how do we ask the question "what is the group of stones connected to x?"
One answer to this is to use implicit segmentation. The group of stones connected to x is described implicitly by a scope operator representing the required correlation to x. When we wish to ask a question about with the group of stones connected to x, we use a policy that restricts moves to the group about x.
Note that a correlation feature can be constructed by answering the question "will these two intersections be part of the same region at the end of the game?". As usual, we can specify different policies for the correlation (e.g. whether we are trying to maximise correlation or whether we have some other aim).