RLAI 2008 Tea Time

	Reinforcement Learning and Artificial Intelligence (RLAI)
	Tea-time Talks 2008

The RLAI group is hosting a series of tea-time talks this summer, and everyone is invited. Please gather with us at the end of the day for some refreshment and refreshing ideas. Come learn about what is going on in the world of reinforcement learning and artificial intelligence.

Tea-time is 4pm monday through thursday. The talk begins very sharply at 4:15, so come early and get some tea and cookies, and a little social time, before that. The talk time is strictly limited to 20 minutes plus 10 minutes for questions.

The intention of the tea-time talks is to efficiently transmit information on a variety of current Reinforcement Learning topics and other related topics.

The ambition of this page is to organise the tea-time talks. It is no longer used to provide a mailing list for participants.

last year's tea-time page

Guidelines

A tea-time talk can either be an overview of a relevant paper or a summary of the presenter's own work
You can present any paper that you think will be worthwhile for the ML/AI/RLAI group
Talks start punctually at 4:15pm, and are strictly limited to 20 minutes + 10 minutes of question time
Presenters should focus on transmitting information as efficiently as possible
It is the presenter's responsibility to pass over disruptive questions if they divert from the talk
It is the listener's responsibility to only ask questions that help clarify rather than disrupt the talk
All subscribers to this list are expected to give one short talk per round, no "hangers on" please!
Short of ideas? Check out the paper suggestions below

Organisation

These web pages are "self-organising"
For each new round of talks, all subscribers will be randomly assigned a day to present a talk
Please resolve date conflicts by agreeing to swap with someone else, and then edit the table below
If you have difficult constraints, notify the list to find someone to swap with (click on "notify" at foot of page)
Please select a topic for your talk at least one week in advance, and edit it into the table below
If you are presenting a paper, please provide a link to the paper in the table
Reminders of upcoming talks will be sent out on a weekly basis
Editing open pages such as this is easy but requires Netscape or Mozilla or this Python script (instructions here). You really should learn how to do this, but don't let it slow down getting your information in here. If you don't know how to edit yet, then send your information by email to me (Varun)
One student per week is in charge of picking up tea and cookies from the lab, making sure the kettle is boiled by 4:00, and clearing away at the end of the talk. This responsibility will rotate through the students in order of their presentations, and is marked in square brackets below.

Schedule: (Staring May 13)

Date	Presenter	Topic	Link	Room
	[Organiser: Varun]
May 13	Rich Sutton	linear dyna: planning with an approximate learned model of the world's dynamic		CSC - 333
May 14	Eric Wiewiora	Trends in Structured Prediction		CSC - 333
May 15	Csaba Szepesvari	Regret to the average vs. regret to the best (Even-Dar et al., COLT-2007)	ppt, pdf paper	CSC - 333
	[Organiser: Adam White]
May 20	Amir Massoud Farahmand	Regularized Fitted Q-Iteration
May 21	Masoud Shamari	Environment with Independent Delayed-Sense Dynamics
May 22	Mike Bowling	Cancelled
	[Organizer: Adam]
May 26	James	Autonomous Geocaching. Thesis/AAMAS talk
May 27	Hamid	Trends in off-policy learning with linear function approximation
May 28	David
May 29	Mike
	[Organizer: Arash]
June 2	David
June 3	Elliot
June 4	Gabor	Wingate-Singh: Exponential Family Predictive Representations of State NIPS 2007
June 5	Brad	Cancelled
	[Organizer: Leah]
June 9	Brad
June 10	Marc
June 11	Yasin	Cancelled
June 12	Varun	Online linear regression and its application to model-based RL (NIPS 2007	pdf
	[Organizer: Yasin]
June 16	Arash
June 17	Vlad
June 18	Yasin	Time is Money!
June 19	Martha	Strategy Evaluation in Extensive Games with Importance Sampling (2008)
	[Organizer: Hamid]
June 23	Barnabas	Bregman Divergences
June 24	Siamak	Three Kinds of Probabilistic Induction: Universal Distributions and Convergence Theorems	pdf
June 25	Leah
June 26	Mohammad
	[Organizer:]
June 30	CANCELLED	CANCELLED
July 1		CANCELLED
July 2		CANCELLED
July 3		CANCELLED

July 7	CANCELLED	CANCELLED
July 8		CANCELLED
July 9		CANCELLED
July 10		CANCELLED
	[Organizer: Amir Massoud]
July 14	Yavar	CANCELLED
July 15	Eric Wiewiora	Doya, et al. Multiple Model Based Reinforcement Learning. Neural Computation, 2002.	html
July 16	Anna Koop
July 17	Adam White	The many faces of optimism: A unifying Approach
	[Organizer: Martha]
July 21	Hamid	Trends in off-policy TD learning with linear function approximation II
July 22	Brian Tanner	RL-Competition 2008 Summary Report (by request) and the RL RecordBook
July 23	Elliot	CANCELLED
July 24	Marc
	[Organizer: Siamak]
July 28	Yavar
July 29	Elliot
July 30	Gabor	PSR On-line sequential bin packing -- András György, Gábor Lugosi, György Ottucsák (COLT 2008)
July 31	Brad	CANCELLED
	[Organizer: Marc]
Aug 4	Civic Holiday	CANCELLED
Aug 5	Barnabas
Aug 6	Rich Sutton	The Critterbot Project
Aug 7		CANCELLED
	[Organizer: Yavar]
Aug 11	Brad
Aug 12	Mike Sokolsky	the system architecture of the critterbot
Aug 13	Varun	CANCELLED
Aug 14	Amir massoud	Compressive Sampling
	[Organizer: Eric Wiewiora]
Aug 18	Martha	"Adapting Bias by Gradient Descent:An Incremental Version of Delta-Bar-Delta" (and further work on meta-learning)
Aug 19	Siamak
Aug 20	Varun	Overview of Active learning	slides
Aug 21	CANCELLED
	[Organizer: Marc]
Aug 25	CANCELLED
Aug 26	Csaba Szespesvari	An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning	PDF
Aug 27	Tom
Aug 28	Yasin

Paper suggestions
Some suggestions for your immediate consideration (feel free to add your own favorites!):

IJCAI-07 papers (http://www.ijcai.org/papers07/contents.php):

AWA* . A Window Constrained Anytime Heuristic Search Algorithm
Solving POMDPs Using Quadratically Constrained Linear Programs
Effective Control Knowledge Transfer Through Learning Skill and Representation Hierarchies
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
Learning to Walk through Imitation(??)
Online Learning and Exploiting Relational Models in Reinforcement Learning
Utile Distinctions for Relational Reinforcement Learning
Topological Value Iteration Algorithm for Markov Decision Processes
The Value of Observation for Monitoring Dynamic Systems
State Similarity Based Approach for Improving Performance in RL
Improving LRTA*(k)
Factored Planning using Decomposition Trees
A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources
Efficiently Exploiting Symmetries in Real Time Dynamic Programming
An Analysis of Laplacian Methods for Value Function Approximation in MDPs
Bayesian Inverse Reinforcement Learning
Deictic Option Schemas
Real-Time Heuristic Search with a Priority Queue
AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs
Efficient Bayesian Task-Level Transfer Learning
Memory-Bounded Dynamic Programming for DEC-POMDPs
Forward Search Value Iteration for POMDPs
Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL
An Experts Algorithm for Transfer Learning
Direct Code Access in Self-Organizing Neural Networks for Reinforcement Learning
Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs
Dynamics of Temporal Difference Learning
Using Learned Policies in Heuristic-Search Planning

NIPS-07 papers (http://books.nips.cc/nips20.html):
David Hsu, Wee Sun Lee, Nan Rong: What makes some POMDP problems easy to approximate?
Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, Mark Lee: Incremental Natural-Gradient Actor-Critic Algorithms
Chris Atkeson, Benjamin Stephens: Random Sampling of States in Dynamic Programming
Stephane Ross, Brahim Chaib-draa, Joelle Pineau: Theoretical Analysis of Heuristic Search Methods for Online POMDPs
Marcus Hutter, Shane Legg: Temporal Difference with Eligibility Traces Derived from First Principles
Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans: Stable Dual Dynamic Programming
David Wingate, Satinder Singh Baveja: Exponential Family Predictive Representations of State
Alexander Strehl, Michael Littman: Online Linear Regression and Its Application to Model-Based Reinforcement Learning
Ambuj Tewari, Peter Bartlett: Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
John Langford, Tong Zhang: The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
Stephane Ross, Joelle Pineau: Bayes-Adaptive POMDPs
Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey Kephart, David Levine, Freeman Rawson, Charles Lefurgy: Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Alessandro Lazaric, Marcello Restelli, Andrea Bonarini: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Yuval Tassa, Tom Erez, William Smart: Receding Horizon Differential Dynamic Programming

JMLR papers (2006-):
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
    Sridhar Mahadevan, Mauro Maggioni; 8(Oct):2169--2231, 2007.
Hierarchical Average Reward Reinforcement Learning
    Mohammad Ghavamzadeh, Sridhar Mahadevan; 8(Nov):2629--2669, 2007.
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
    Rémi Munos; 7(Feb):413--427, 2006.
Policy Gradient in Continuous Time
    Rémi Munos; 7(May):771--791, 2006.
Evolutionary Function Approximation for Reinforcement Learning
    Shimon Whiteson, Peter Stone; 7(May):877--917, 2006.
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
    Eyal Even-Dar, Shie Mannor, Yishay Mansour; 7(Jun):1079--1105, 2006.
A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events
    Shalabh Bhatnagar, Vivek S. Borkar, Madhukar Akarapu; 7(Oct):1937--1962, 2006.

MLJ Papers (2006-):
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    Abraham P. George and Warren B. Powell
    Volume 65, Number 1 / October, 2006
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Vladislav B. Tadic'
    Volume 63, Number 2 / May, 2006
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    András Antos, Csaba Szepesvári and Rémi Munos
    Volume 71, Number 1 / April, 2008
Rollout sampling approximate policy iteration
    Christos Dimitrakakis, Michail G. Lagoudakis
    Volume 72, Number 3 / September, 2008, pp. 157-171.

IEEE Transactions on Automatic Control and IEEE in general
(papers without a venue are from IEEE TAC)

2007
Constrained Optimization for Average Cost Continuous-Time Markov Decision Processes
    Xianping Guo;
    Volume 52, Issue 6, June 2007 Page(s):1139 - 1143
A model reference adaptive search method for stochastic optimization with applications to Markov decision processes
    Jiaqiao Hu; Fu, M.C.; Marcus, S.I.;
    46th IEEE Conference on Decision and Control, 2007
    12-14 Dec. 2007 Page(s):975 - 980
PAC bounds for simulation-based optimization of Markov decision processes
    Watson, T.;
    46th IEEE Conference on Decision and Control, 2007
    12-14 Dec. 2007 Page(s):3466 - 3471
Recursive Learning Automata Approach to Markov Decision Processes
    Hyeong Soo Chang; Fu, M.C.; Jiaqiao Hu; Marcus, S.I.;
    Volume 52, Issue 7, July 2007 Page(s):1349 - 1355
Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models
    Xi-Ren Cao; Xianping Guo;
    Volume 52, Issue 4, April 2007 Page(s):677 - 681

2006
Robustness of policies in constrained Markov decision processes
    Zadorojniy, A.; Shwartz, A.;
    Volume 51, Issue 4, April 2006 Page(s):635 - 638

2005
Markov decision Processes with fractional costs
    Zhiyuan Ren; Krogh, B.H.;
    Volume 50, Issue 5, May 2005 Page(s):646 - 650
Evolutionary policy iteration for solving Markov decision processes
    Hyeong Soo Chang; Hong-Gi Lee; Fu, M.C.; Marcus, S.I.;
    Volume 50, Issue 11, Nov. 2005 Page(s):1804 - 1808
An analysis of gradient-based policy iteration
    Dankert, J.; Lei Yang; Si, J.;
    2005 IEEE International Joint Conference on Neural Networks, 2005.
    Volume 5, 31 July-4 Aug. 2005 Page(s):2977 - 2982 vol. 5

2004
A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes
    Bhatnagar, S.; Kumar, S.;
    Volume 49, Issue 4, April 2004 Page(s):592 - 598
Potential-based online policy iteration algorithms for Markov decision processes
    Hai-Tao Fang; Xi-Ren Cao;
    Volume 49, Issue 4, April 2004 Page(s):493 - 505

2003
Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes
    Abad, F.J.V.; Krishnamurthy, V.;
    42nd IEEE Conference on Decision and Control, 2003.
    Volume 3, 9-12 Dec. 2003 Page(s):2823 - 2828 Vol.3
Multitime scale Markov decision processes
    Hyeong Soo Chang; Fard, P.J.; Marcus, S.I.; Shayman, M.;
    Volume 48, Issue 6, June 2003 Page(s):976 - 987
Markov decision processes with delays and asynchronous cost collection
    Katsikopoulos, K.V.; Engelbrecht, S.E.;
    Volume 48, Issue 4, April 2003 Page(s):568 - 574

2002 and earlier
A note on optimality conditions for continuous-time Markov decision processes with average cost criterion
    Xianping Guo; Ke Liu;
    Volume 46, Issue 12, Dec. 2001 Page(s):1984 - 1989
Mixed risk-neutral/minimax control of discrete-time, finite-state Markov decision processes
    Coraluppi, S.P.; Marcus, S.I.;
    Volume 45, Issue 3, March 2000 Page(s):528 - 532
The policy iteration algorithm for average reward Markov decision processes with general state space
    Meyn, S.P., Volume 42, Issue 12, Dec. 1997 Page(s):1663 - 1680

Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Eric Wiewiora will be talking about Trends in Structured Prediction Varun Grover, Wed May 14 15:27:06 2008

Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Csaba will be talking about some bounds Varun Grover, Thu May 15 15:51:28 2008

Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Amir Massoud will be talking about Regularized Fitted Q-Iteration Varun Grover, Tue May 20 14:47:47 2008

Tea talk today in CSC 349. Hamid will be talking about trends in off-policy learning Varun Grover, Tue May 27 16:03:37 2008

David is giving tea talk today at 4:15 in room CSC 333. Cookies and tea at 4:00. Varun Grover, Wed May 28 15:44:14 2008

Mike Bowling is giving the tea talk today at 4:00. CSC 333 is busy with chair selection forum. If it frees up for the talk then we will have the talk in 333 otherwise in CSC 349. Varun Grover, Thu May 29 15:50:47 2008

Marc is giving the tea talk today at 4:00 in room CSC 333. There will be no tea talk tomorrow. Also I am only going to be sending these reminder to the RLAI mailing list. If you are not on that mailing please subscribe https://mail.cs.ualberta.ca/mailman/listinfo/rlaigroup Varun Grover, Tue Jun 10 15:50:01 2008

Tea talk today in room CSC 333 at 4:00. Varun Grover, Thu Jun 12 15:44:07 2008

Tea talk today at 4:00 by Arash. Title is "Boosting Fitted Q Iteration" Anonymous, Mon Jun 16 15:21:41 2008

Eric Wiewiora will be presenting "Multiple Model Based Reinforcement Learning" at the tea talk today at 4:00 in room CSC 333. Varun Grover, Tue Jul 15 15:30:40 2008

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 3146/27

Tea-time Talks 2008