Home Reinforcement Learning and Artificial Intelligence (RLAI)
CMPUT 499/609: Monte Carlo Programming Assignment
Created by Brian Booth Feb 10 2006


Objective & Description

    The goal of this assignment is to become familiar with the Monte Carlo and TD learning algorithms and to get comfortable working with the RL-Glue framework.

    In this assignment, you will implement a Monte Carlo - Exploring Start (ES) agent and a Sarsa ES agent to play varients of the game Blackjack. You are asked to come up with these agents' policies for playing Blackjack assuming that:
  1. The number of cards in a suit that take a value of ten (i.e. 10s, Jacks, Queens, & Kings) is doubled from 4 to 8.  Thus each card dealt has probability 8/17 of being a ten and probability 1/17 of being each of Ace, two, three,..., and nine.
  2. The number of cards in a suit that take a value of ten (i.e. 10s, Jacks, Queens, & Kings) is halved from 4 to 2.  Thus each card dealt has probability 2/11 of being a ten and probability 1/11 of being each of Ace, two, three,..., and nine.
    Remember that cards are dealt with replacement, as if from an infinite deck, so there is no need to keep track of which cards have already been dealt.  The state space remains the same as in the original problem.

    The Blackjack environment can be found as part of the RL-Glue framework in the directory 'Env'. Also, much of the Grid World Benchmark code that comes with RL-Glue will come in handy.

    The Monte Carlo ES algorithm is described in Figure 5.4, p. 120 of the textbook. The Sarsa algorithm is described in Figure 6.9, p. 146 of the textbook, but you will have to modify it to use exploring starts. Have the agent learn over the course of 100,000 episodes. A description of the game of Blackjack appears as part of Example 5.1 in the textbook.

    What to hand In (On paper):
Reminder: Assignments must be your own work.  It is ok to talk to other students, but not to exchange code of any kind.  A good rule of thumb is that if you talk to someone, don't bring a pencil.  Email is generally not a good idea, but follow the spirit of this rule of thumb in any email.



Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  1880/0