RLAI 2008 Tea Time
Reinforcement Learning and Artificial Intelligence (RLAI)

Tea-time Talks 2008


The RLAI group is hosting a series of tea-time talks this summer, and everyone is invited.  Please gather with us at the end of the day for some refreshment and refreshing ideas.  Come learn about what is going on in the world of reinforcement learning and artificial intelligence.

Tea-time is 4pm monday through thursday.  The talk begins very sharply at 4:15, so come early and get some tea and cookies, and a little social time, before that.  The talk time is strictly limited to 20 minutes plus 10 minutes for questions.

The intention of the tea-time talks is to efficiently transmit information on a variety of current Reinforcement Learning topics and other related topics.

The ambition of this page is to organise the tea-time talks. It is no longer used to provide a mailing list for participants.

last year's tea-time page


Guidelines

Organisation



Schedule: (Staring May 13)

Date
Presenter
Topic
Link
Room

[Organiser: Varun]



May 13
Rich Sutton
linear dyna: planning with an approximate learned model of the world's dynamic

CSC - 333
May 14
Eric Wiewiora
Trends in Structured Prediction

CSC - 333
May 15
Csaba Szepesvari
Regret to the average vs. regret to the best (Even-Dar et al., COLT-2007)
ppt, pdf
paper

CSC - 333

[Organiser: Adam White]



May 20
Amir Massoud Farahmand
Regularized Fitted Q-Iteration


May 21
Masoud Shamari
Environment with Independent Delayed-Sense Dynamics


May 22
Mike Bowling
Cancelled



[Organizer: Adam]



May 26
James
Autonomous Geocaching. Thesis/AAMAS talk


May 27
Hamid
Trends in off-policy learning with linear function approximation


May 28
David



May 29
Mike




[Organizer: Arash]



June 2
David



June 3
Elliot



June 4
Gabor
Wingate-Singh: Exponential Family Predictive Representations of State NIPS 2007


June 5
Brad
Cancelled



[Organizer: Leah]



June 9
Brad



June 10
Marc



June 11
Yasin
Cancelled


June 12
Varun
Online linear regression and its application to model-based RL (NIPS 2007
pdf


[Organizer: Yasin]



June 16
Arash



June 17
Vlad



June 18
Yasin
Time is Money!


June 19
Martha
Strategy Evaluation in Extensive Games with Importance Sampling (2008)



[Organizer: Hamid]



June 23
Barnabas
Bregman Divergences


June 24
Siamak
Three Kinds of Probabilistic Induction: Universal Distributions and Convergence Theorems
pdf

June 25
Leah



June 26
Mohammad




[Organizer:]



June 30
CANCELLED
CANCELLED


July 1

CANCELLED

July 2

CANCELLED

July 3

CANCELLED






July 7
CANCELLED CANCELLED

July 8

CANCELLED

July 9

CANCELLED

July 10

CANCELLED


[Organizer: Amir Massoud]



July 14
Yavar
CANCELLED


July 15
Eric Wiewiora
Doya, et al. Multiple Model Based Reinforcement Learning. Neural Computation, 2002.
html

July 16
Anna Koop



July 17
Adam White
The many faces of optimism: A unifying Approach



[Organizer: Martha]



July 21
Hamid
 Trends in off-policy TD learning with linear function approximation II

July 22
Brian Tanner
RL-Competition 2008 Summary Report (by request) and the RL RecordBook


July 23
Elliot
CANCELLED


July 24
Marc




[Organizer: Siamak]


July 28
Yavar



July 29
Elliot



July 30
Gabor
PSR On-line sequential bin packing  -- András György, Gábor Lugosi, György Ottucsák (COLT 2008)


July 31
Brad
CANCELLED



[Organizer: Marc]



Aug 4
Civic Holiday
CANCELLED


Aug 5
Barnabas



Aug 6
Rich Sutton
The Critterbot Project


Aug 7

CANCELLED



[Organizer: Yavar]



Aug 11
Brad



Aug 12
Mike Sokolsky
the system architecture of the critterbot


Aug 13
Varun
CANCELLED


Aug 14
Amir massoud
Compressive Sampling



[Organizer: Eric Wiewiora]


Aug 18
Martha
"Adapting Bias by Gradient Descent:An Incremental Version of Delta-Bar-Delta" (and further work on meta-learning)


Aug 19
Siamak



Aug 20
Varun
Overview of Active learning
slides

Aug 21
CANCELLED




[Organizer: Marc]



Aug 25
CANCELLED



Aug 26
Csaba Szespesvari
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
PDF

Aug 27
Tom



Aug 28
Yasin







Paper suggestions
Some suggestions for your immediate consideration (feel free to add your own favorites!):

IJCAI-07 papers (http://www.ijcai.org/papers07/contents.php):

AWA* . A Window Constrained Anytime Heuristic Search Algorithm
Solving POMDPs Using Quadratically Constrained Linear Programs
Effective Control Knowledge Transfer Through Learning Skill and Representation Hierarchies
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
Learning to Walk through Imitation(??)
Online Learning and Exploiting Relational Models in Reinforcement Learning
Utile Distinctions for Relational Reinforcement Learning
Topological Value Iteration Algorithm for Markov Decision Processes
The Value of Observation for Monitoring Dynamic Systems
State Similarity Based Approach for Improving Performance in RL
Improving LRTA*(k)
Factored Planning using Decomposition Trees
A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources
Efficiently Exploiting Symmetries in Real Time Dynamic Programming
An Analysis of Laplacian Methods for Value Function Approximation in MDPs
Bayesian Inverse Reinforcement Learning
Deictic Option Schemas
Real-Time Heuristic Search with a Priority Queue
AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs
Efficient Bayesian Task-Level Transfer Learning
Memory-Bounded Dynamic Programming for DEC-POMDPs
Forward Search Value Iteration for POMDPs
Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL
An Experts Algorithm for Transfer Learning
Direct Code Access in Self-Organizing Neural Networks for Reinforcement Learning
Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs
Dynamics of Temporal Difference Learning
Using Learned Policies in Heuristic-Search Planning


NIPS-07 papers (http://books.nips.cc/nips20.html):
David Hsu, Wee Sun Lee, Nan Rong: What makes some POMDP problems easy to approximate?
Shalabh Bhatnagar, Richard Sutton, Mohammad Ghavamzadeh, Mark Lee: Incremental Natural-Gradient Actor-Critic Algorithms
Chris Atkeson, Benjamin Stephens: Random Sampling of States in Dynamic Programming
Stephane Ross, Brahim Chaib-draa, Joelle Pineau: Theoretical Analysis of Heuristic Search Methods for Online POMDPs
Marcus Hutter, Shane Legg: Temporal Difference with Eligibility Traces Derived from First Principles
Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans: Stable Dual Dynamic Programming
David Wingate, Satinder Singh Baveja: Exponential Family Predictive Representations of State
Alexander Strehl, Michael Littman: Online Linear Regression and Its Application to Model-Based Reinforcement Learning
Ambuj Tewari, Peter Bartlett: Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
John Langford, Tong Zhang: The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
Stephane Ross, Joelle Pineau: Bayes-Adaptive POMDPs
Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey Kephart, David Levine, Freeman Rawson, Charles Lefurgy: Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Alessandro Lazaric, Marcello Restelli, Andrea Bonarini: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Yuval Tassa, Tom Erez, William Smart: Receding Horizon Differential Dynamic Programming


JMLR papers (2006-):
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
    Sridhar Mahadevan, Mauro Maggioni; 8(Oct):2169--2231, 2007.
Hierarchical Average Reward Reinforcement Learning
    Mohammad Ghavamzadeh, Sridhar Mahadevan; 8(Nov):2629--2669, 2007.
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
    Rémi Munos; 7(Feb):413--427, 2006.
Policy Gradient in Continuous Time
    Rémi Munos; 7(May):771--791, 2006.
Evolutionary Function Approximation for Reinforcement Learning
    Shimon Whiteson, Peter Stone; 7(May):877--917, 2006.
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
    Eyal Even-Dar, Shie Mannor, Yishay Mansour; 7(Jun):1079--1105, 2006.
A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events
    Shalabh Bhatnagar, Vivek S. Borkar, Madhukar Akarapu; 7(Oct):1937--1962, 2006.


MLJ Papers (2006-):
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    Abraham P. George and Warren B. Powell
    Volume 65, Number 1 / October, 2006
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Vladislav B. Tadic'
    Volume 63, Number 2 / May, 2006
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    András Antos, Csaba Szepesvári and Rémi Munos
    Volume 71, Number 1 / April, 2008
Rollout sampling approximate policy iteration
    Christos Dimitrakakis, Michail G. Lagoudakis
    Volume 72, Number 3 / September, 2008, pp. 157-171.

IEEE Transactions on Automatic Control and IEEE in general
(papers without a venue are from IEEE TAC)

2007
Constrained Optimization for Average Cost Continuous-Time Markov Decision Processes
    Xianping Guo;
    Volume 52,  Issue 6,  June 2007 Page(s):1139 - 1143
A model reference adaptive search method for stochastic optimization with applications to Markov decision processes
    Jiaqiao Hu; Fu, M.C.; Marcus, S.I.;
    46th IEEE Conference on Decision and Control, 2007
    12-14 Dec. 2007 Page(s):975 - 980
PAC bounds for simulation-based optimization of Markov decision processes
    Watson, T.;
    46th IEEE Conference on Decision and Control, 2007
    12-14 Dec. 2007 Page(s):3466 - 3471
Recursive Learning Automata Approach to Markov Decision Processes
    Hyeong Soo Chang; Fu, M.C.; Jiaqiao Hu; Marcus, S.I.;
    Volume 52,  Issue 7,  July 2007 Page(s):1349 - 1355
Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models
    Xi-Ren Cao; Xianping Guo;
    Volume 52,  Issue 4,  April 2007 Page(s):677 - 681

2006
Robustness of policies in constrained Markov decision processes
    Zadorojniy, A.; Shwartz, A.;
    Volume 51,  Issue 4,  April 2006 Page(s):635 - 638

2005
Markov decision Processes with fractional costs
    Zhiyuan Ren; Krogh, B.H.;
    Volume 50,  Issue 5,  May 2005 Page(s):646 - 650
Evolutionary policy iteration for solving Markov decision processes
    Hyeong Soo Chang; Hong-Gi Lee; Fu, M.C.; Marcus, S.I.;
    Volume 50,  Issue 11,  Nov. 2005 Page(s):1804 - 1808
An analysis of gradient-based policy iteration
    Dankert, J.; Lei Yang; Si, J.;
    2005 IEEE International Joint Conference on Neural Networks, 2005.
    Volume 5,  31 July-4 Aug. 2005 Page(s):2977 - 2982 vol. 5

2004
A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes
    Bhatnagar, S.; Kumar, S.;
    Volume 49,  Issue 4,  April 2004 Page(s):592 - 598
Potential-based online policy iteration algorithms for Markov decision processes
    Hai-Tao Fang; Xi-Ren Cao;
    Volume 49,  Issue 4,  April 2004 Page(s):493 - 505

2003
Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes
    Abad, F.J.V.; Krishnamurthy, V.;
    42nd IEEE Conference on Decision and Control, 2003.
    Volume 3,  9-12 Dec. 2003 Page(s):2823 - 2828 Vol.3
Multitime scale Markov decision processes
    Hyeong Soo Chang; Fard, P.J.; Marcus, S.I.; Shayman, M.;
    Volume 48,  Issue 6,  June 2003 Page(s):976 - 987
Markov decision processes with delays and asynchronous cost collection
    Katsikopoulos, K.V.; Engelbrecht, S.E.;
    Volume 48,  Issue 4,  April 2003 Page(s):568 - 574

2002 and earlier
A note on optimality conditions for continuous-time Markov decision processes with average cost criterion
    Xianping Guo; Ke Liu;
    Volume 46,  Issue 12,  Dec. 2001 Page(s):1984 - 1989
Mixed risk-neutral/minimax control of discrete-time, finite-state Markov decision processes
    Coraluppi, S.P.; Marcus, S.I.;
    Volume 45,  Issue 3,  March 2000 Page(s):528 - 532
The policy iteration algorithm for average reward Markov decision processes with general state space
    Meyn, S.P., Volume 42,  Issue 12,  Dec. 1997 Page(s):1663 - 1680


Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Eric Wiewiora will be talking about Trends in Structured Prediction  

Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Csaba will be talking about some bounds  

Tea time today at 4:15 in room CSC 333 with refreshments starting at 4:00. Amir Massoud will be talking about Regularized Fitted Q-Iteration  

Tea talk today in CSC 349. Hamid will be talking about trends in off-policy learning  

David is giving tea talk today at 4:15 in room CSC 333. Cookies and tea at 4:00.  

Mike Bowling is giving the tea talk today at 4:00. CSC 333 is busy with chair selection forum. If it frees up for the talk then we will have the talk in 333 otherwise in CSC 349.  

Marc is giving the tea talk today at 4:00 in room CSC 333. There will be no tea talk tomorrow. Also I am only going to be sending these reminder to the RLAI mailing list. If you are not on that mailing please subscribe https://mail.cs.ualberta.ca/mailman/listinfo/rlaigroup  

Tea talk today in room CSC 333 at 4:00.  

Tea talk today at 4:00 by Arash. Title is "Boosting Fitted Q Iteration"  

Eric Wiewiora will be presenting "Multiple Model Based Reinforcement Learning" at the tea talk today at 4:00 in room CSC 333.  

Extend this Page   How to edit   Style   Subscribe   Notify   Suggest   Help   This open web page hosted at the University of Alberta.   Terms of use  3146/27