Reinforcement Learning and Information Access

or

What is the Real Learning Problem in Information Access?

by Rich Sutton
University of Massachusetts
rich@cs.umass.edu

Presented at the AAAI Stanford Spring Symposium on
Machine Learning and Information Access
March 26, 1996

with many thanks to Rik Belew and Jude Shavlik

Introductory Patter

In this talk we will try to take a new look at the learning problem in information access. How is it structured? What training information is really available, or likely to be available? Will there be delays between decision making and the receipt of relevant feedback? I am a newcomer to information access, but I have experience in reinforcement learning, and one of the main lessons of reinforcement learning is that it is really important to understand the true nature of the learning problem you want to solve.
Here is an example that illustrates the whole idea. In 1989 Gerry Tesauro at IBM built the world's best computer player of backgammon. It was a neural network trained from 15,000 examples of human-expert moves. Then he tried a reinforcement learning approach. He trained the same network not from expert examples, but simply by playing it against itself and observing the outcomes. After a month of self-play, the program became the new world champ of computers. Now it is an extremely strong player, on a par with the world's best grandmasters, who are now learning from it! The self-play approach worked so well primarily because it could generate new training data itself. The expert-trained network was always limited by its 15,000 examples, laboriously constructed by human experts. Self-play training data may be individually less informative, but so much more of it can be generated so cheaply that it is a big win in the long run.
The same may be true for information access. Right now we use training sets of documents labeled by experts as relevant or not relevant. Such training data will always be expensive, scarce, and small. How much better it would be if we could generate some kind of training data online, from the normal use of the system. The data may be imperfect and unclear, but certainly it will be plentiful! It may also be truer in an important sense. Expert-labeled training sets are artificial, and do not accurately mirror real usage. In backgammon, the expert-trained system could only learn to mimic the experts, not to win the game. Only the online-trained system was able to learn to play better than the experts. Its training data was more real.
This then is the challenge: to think about information access and uncover the real structure of the learning problem. How can learning be done online? Learning thrives on data, data, data! How can we get the data we need online, from the normal operation of the system, without relying on expensive, expert-labeled training sets?

This talk proceeds in three parts. The first is an introduction to reinforcement learning. The second examines how parts of the learning problem in information access are like those solved by reinforcement learning methods. But the information access problem doesn't map exactly onto the reinforcement learning problem. It has a special structure all it own. In the third part of the talk we examine some of this special structure and what kind of new learning methods might be applied to it.

The rest below are approximations to the slides presented in the talk.

Conclusions (in advance)

Learning in IA (Information Access) is like learning everywhere
- you are never told the right answers
- its a sequential problem - actions affect opportunities
Reinforcement Learning addresses these issues
Learning can be powerful when done online (from normal operation)
What is online data/feedback like in IA?