Reinforcement learning seminar, list of subjects for second presentation

Articles are grouped according to subject.

There is one article selection per participant, but team work is possible by having the members of the team select articles from the same subject area. The maximal team size is three persons. A team makes a joint presentation, where results and conclusions in the selected articles are compared against each other.

Articles listed in "Typical benchmark applications" are mainly suitable for individual work.

Direct use of continuous-valued state variables and actions

Doya, Kenji (2000). Reinforcement Learning in Continuous Time and Space. Neural Computation, vol. 12. pp. 219-245. Long article, but good results, explanations and illustrations of learned models, which help in general understanding.

Kimura, Hajime, Miyazaki, Kazuteru, Kobayashi, Shigenobu (1997). Reinforcement Learning in POMDPs with Function Approximation. Proc. of 14th Int. Conf. on Machine Learning. pp. 152-160. On a technique called "policy gradient. Uses a "crawling robot" that learns to crawl forward with one arm. Could also be used in "simulated robot environments" subject.

Kimura, Hajime, Kobayashi, Shigenobu (1999). Efficient Non-Linear Control by Combining Q-learning with Local Linear Controllers. Proc. of 16th Int. Conf. on Machine Learning. pp. 210-219. Very good results on the cart-pole task, where the agent learns to swing the pole up. Best results I have seen on this task.
Selected by Jukka Villstedt.

Simulated robot environments

Rummery, G. A., Niranjan, M. (1994). On-Line Q-Learning Using Connectionist Systems. Tech. Rep. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department. 20 p. Presentation of the SARSA learning method. Used on interesting simulated robot navigation task.
Selected by Samuli Kekki.

Stone, Peter, Sutton, Richard S. (2001). Scaling Reinforcement Learning toward RoboCup Soccer. In: Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA. pp. 537-544. Robots learning to keep ball in simulated RoboCup soccer.
Selected by Esa Seuranen.

Sun, Ron, Peterson, Todd (1998). Autonomous Learning of Sequential Tasks: Experiments and Analyses. IEEE Trans. on Neural Networks, Vol. 9, No. 6. pp. 1217-1234. Quite a long article, but the "submarine navigation" task is interesting. Also good figures of results.
Selected by Mikko Rahikainen.

Real robot environments

Maes, Pattie, Brooks, Rodney A. (1990). Learning to coordinate behaviors. In: Proceedings of Eighth National Conference on Artificial Intelligence, Morgan Kaufmann. pp. 796-802. Six-legged robot learns to walk.
Selected by Elina Parviainen.

Mahadevan, Sridhar, Connell, Jonathan (1992). Automatic Programming of Behavior-based Robots using Reinforcement Learning. Artificial Intelligence, Vol. 55, Nos. 2-3. pp. 311-365. This is a long article to read. However, it is quite easy to read, interesting and educating.
Selected by Teddy Grenman.

Mataric, Maja J. (1994). Reward Functions for Accelerated Learning. In: Cohen, W. W., Hirsch, H. (eds.) Machine Learning: Proceedings of the Eleventh International Conference. Morgan-Kaufmann, CA. Multiple robots collecting pucks into a "home" area. This article is also included in the "reward shaping" subject.
Selected by Jaakko Nyrölä & Marko Nikula.

Mataric, Maja J. (1997). Reinforcement Learning in the Multi-Robot Domain. Autonomous Robots, Vol. 4, No. 1. pp. 73-83. "Journal version" of previous. Longer explanations and new insights, but longer and more theory to read.
Selected by Jaakko Nyrölä & Marko Nikula.

Relationship between Reinforcement Learning and the Brain

Doya, Kenji (2002). Metalearning and neuromodulation. Neural Networks, Vol. 15, Nos 4-6. pp. 495-506. A lot about four main neuro-transmitters in the brain and how they could map to Reinforcement Learning parameters.
Selected by Heli Nyholm.

Kakade, Sham, Dayan, Peter (2002). Dopamine: generalization and bonuses. Neural Networks, Vol. 15. pp. 549-559. Theories and results about the relation between TD-learning and Dopamine levels in the brain, including interest for novel situations and solutions.
Selected by Sebastian von Knorring.

Suri, Roland E. (2002). TD models of reward predictive responses in dopamine neurons. Neural Networks, Vol. 15. pp. 523-533. On connection between TD-learning, reward expectation, animal behavior etc.
Selected by Jussi Rautio.

Reward Shaping

Mataric, Maja J. (1994). Reward Functions for Accelerated Learning. In: Cohen, W. W., Hirsch, H. (eds.) Machine Learning: Proceedings of the Eleventh International Conference. Morgan-Kaufmann, CA. Multiple robots collecting pucks into a "home" area. This article is also included in the "real robot environments" subject.
Selected by Jaakko Nyrölä & Marko Nikula.

Ng, Andrew Y., Harada, Daishi, Russell, Stuart (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proceedings of the Sixteenth International Conference on Machine Learning. On how to make exploration faster by modifying the reward function so that the agent is directly guided towards the goal.
Selected by Tapani Raiko.

Typical benchmark applications

Boyan, J. A., Moore, A. W. (1995). Generalization in Reinforcement Learning: Safely Approximating the Value Function. Advances in Neural Information Processing Systems 7. pp. 369-376. About the use of function approximators in several benchmark tasks. Short article, but might require finding some extra background information.
Selected by Antti Päällysaho.

Moore, A. W., Atkeson, C. G. (1995). The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces. Machine Learning, Vol. 21. pp. 1-36. About automatic partitioning of continuous-valued variables. Many classical benchmark applications are used. Quite a long article, but quite easy to read and contains many good illustrations that help understanding.
Selected by Antti Ukkonen.

Randløv, J., Alstrøm, P. (1998). Learning to Drive a Bicycle using Reinforcement Learning and Shaping. ICML-98. pp. 463-471.
AND
Randløv, Jette (2000). Shaping in Reinforcement Learning by Changing the Physics of the Problem. Proceedings of ICML-2000 conference. pp. 767-774. Funny task on learning how to ride a bicycle. These articles are also possible to use in the "reward shaping" subject.
Selected by Juho Törmä.

Tesauro, G.J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3, 58-68. Available on-line in HTML format. On one of the biggest success stories of Reinforcement Learning.
Selected by Jarmo Korhonen.

This page is maintained by Kary Främling,
E-mail: Kary.Framling@hut.fi.
Last updated on March 18th, 2004