TKO-laboratorio TKK

 

T-93.850 Seminar on Knowledge Engineering

Spring 2004: Reinforcement learning


General seminar information

Working methods: Bibliographic research on specific subjects of reinforcement learning and presenting them to the other seminar participants at different sessions. A seminar report of 5-10 pages on the selected topic is also required.

Presentations and the report may be done individually or in groups of 2-3 people. Groups should meet up and discuss their work before the presentation, so that the group work helps all members in the group to understand the subject. Group presentations also have to be partitioned between the group members so that all members present different parts of the work. For the first presentation, all members of a group read the same paper. For the second presentation, all members of a group read papers on the same subject area, so that they can compare the methods, results and conclusions of the different papers. Group compositions may change between the first and the second presentation, but the seminar report is either written individually or with the same group as in the second presentation.

Assessment: Accepted/Rejected.

First meeting: on Thursday, 22 January 2004 at 12-14, Lecture Room T4 in Computer Science building.
This meeting starts with an introductory lecture to reinforcement learning, including demonstrations.
Please send a message to the lecturer in advance about your intention to attend the course.
Also tell if this seminar time is not possible for you.

Extent: 2 credit units, unless the amount of work turns out to be much greater than expected. Extra credit units can be obtained for instance by:

  • Reading articles by the lecturer, relating their methods and results to other work, writing small report.
  • Doing practical tests on some benchmark application using freeware RL simulation software.
  • Propositions made by the participants are also possible.


About reinforcement learning

Reinforcement Learning (RL) is largely based on the idea to develop models that would do problem solving and learning in similar ways as humans and animals do. The models should preferrably also correspond to some rough-level knowledge about how the brain operates, i.e. activations and connections between different areas of the brain.

Animal problem solving mainly seems to be based on trial and learning. The success or failure of a trial modifies behavior in the "right" direction after some number of trials, where "some number" is in the range one (e.g. learning how to turn on the radio from "power" button) to infinity (e.g. learning how to grab things, which is a life-long adaptation procedure).

RL methods have been successfully applied to many problems where more "conventional" methods are difficult to use due to factors like lacking data about the environment, which forces the RL "agent" to explore its environment and learn interactively. Exploring is a procedure where the agent has to take actions without a priori knowledge about how good or bad the action is, which may be known only much later when a goal is reached or when the task failed.


Links
Schedule

Seminar opening, introductory lecture

Thu 22.1.2004, 12-14, Lecture Room T4 The participants select articles to read for the first presentation meeting. More detailed instructions are given and approximate dates are decided for presentation meetings.
Introductory lecture, slides available as PDF and browsable here.

Meetings for presenting first article

List of articles is here .

Count about 30 minutes for the presentation, including time for questions and discussion. For this first presentation, the emphasis is on understanding the basic principles of Reinforcement Learning. Therefore the participants should try to understand and explain basic formulas as well as possible. There is also always some "major message(s)" that the author tries to express in the articles. Try to identify it/them as well as possible. Also explain experimental tasks; characteristics of the task, what methods are used, how good are the results and what are the conclusions of them. Finally express your own opinion on the work, i.e. how understandable is the article, does it make you want to search further, what are the open issues etc.

For group presentations, count 30-45 minutes for 2 persons and 30-60 minutes for 3 persons. The goal of team work is not just to split up the article in pieces that are presented separately. The goal is rather to discuss and analyze the article in advance and give a more refined analysis of the article than would be possible in an individual presentation.

All seminar participants are expected to attend all meetings. However, it is possible to be absent from one of the presentation meetings for the first article without special reasons. If you have to be absent more than once, please contact the lecturer. Also signal any problems with the schedule below as soon as possible.

Tuesday 24.2.2004, 12-16, Room A232 Kaelbling, Littman, Moore (1996). Reinforcement Learning: A Survey.
(Sebastian Von Knorring & Juho Törmä & Samuli Kekki).
Monday 1.3.2004, 10-14, Room T4 Moore, Atkeson (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.
(Antti Ukkonen & Jukka Villstedt), Teddy Grenman
Mahadevan (1996). Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results.
(Esa Seuranen & Mikko Rahikainen).
Monday 15.3.2004, 10-14, Room T4 Barto, Sutton, Watkins (1990). Learning and Sequential Decision Making.
Jarmo Korhonen, (Jussi Rautio & Heli Nyholm)
Thrun (1992). The role of exploration in learning control.
(Antti Päällysaho & Tapani Raiko)
Whitehead, Lin (1995). Reinforcement learning of non-Markov decision processes.
(Elina Parviainen & Jaakko Nyrölä & Marko Nikula).

Meetings for presenting second article

The list of articles is here. Deadline for selecting the article to present at the second meeting is 1.3.2004.

For the second presentation, there is one article per participant, but team work is possible by having the members of the team select articles from the same subject area. The maximal team size is three persons. A team makes a joint presentation, where results and conclusions in the selected articles are compared against each other.

There is always some "major message(s)" that the author tries to express in the articles. Try to identify it/them as well as possible. Also explain experimental tasks; characteristics of the task, what methods are used, how good are the results and what are the conclusions of them. Finally express your own opinion on the work, i.e. how understandable is the article, does it make you want to search further, what are the open issues etc.

Count about 30 minutes for the presentation. About 15 minutes are reserved for questions and discussion, but this time limit is intended to be flexible. At this presentation, all participants are also expected to show at least a table-of-contents for the seminar report.

It is, of course, desirable that all participants attend all meetings. However, with the current schedule of six meetings, only four meetings are compulsory. Send a message with a justified explanation if you can only attend three meetings. Also signal any problems with the schedule below as soon as possible.

Monday 29.3.2004, 10-14, Room T4 Boyan & Moore (1995). Generalization in Reinforcement Learning...
Antti Päällysaho
Moore & Atkeson (1995). The Parti-game Algorithm...
Antti Ukkonen
Monday 5.4.2004, 10-14, Room T4 Rummery & Niranjan, M. (1994). On-Line Q-Learning Using Connectionist Systems.
Samuli Kekki
Stone & Sutton (2001). Scaling Reinforcement Learning toward RoboCup Soccer.
Esa Seuranen
Maes & Brooks (1990). Learning to coordinate behaviors.
Elina Parviainen
Monday 19.4.2004, 10-14, Room T4 Mahadevan & Connell (1992). Automatic Programming of Behavior-based Robots...
Teddy Grenman
Mataric (1994, 1997). Reward Functions for Accelerated Learning; Reinforcement Learning in the Multi-Robot Domain.
Jaakko Nyrölä & Marko Nikula
Tuesday 20.4.2004, 12-16, Room A232 Doya (2002). Metalearning and neuromodulation.
Heli Nyholm
Kakade & Dayan (2002). Dopamine: generalization and bonuses.
Sebastian von Knorring
Monday 26.4.2004, 10-14, Room T4 Randløv, Alstrøm (1998, 2000). Learning to ride a bicycle.
Juho Törmä
Ng, Harada, Russell (1999). Policy invariance under reward transformations...
Tapani Raiko
Kimura & Kobayashi (1999). Efficient Non-Linear Control...
Jukka Villstedt
Tuesday 27.4.2004, 12-16, Room A232 Sun & Peterson (1998). Autonomous Learning of Sequential Tasks...
Mikko Rahikainen
Tesauro (1995). Temporal difference learning and TD-Gammon.
Jarmo Korhonen
Suri (2002). TD models of reward predictive responses in dopamine neurons.
Jussi Rautio

Seminar report

Deadline: May 2004 unless special agreement with lecturer.

The intention of the seminar report is mainly to show in what way the students have learned the subjects of the seminar. It's role is therefore twofold:

  • Showing what the student learned from the articles and the presentations.
  • Give feedback on the selection of articles and the organisation of the seminar in general.

One possible strukture for the report could be:

  1. Introduction to Reinforcement Learning, short explanation of what it is with own words.
  2. The articles read: what they were about, how good they were, what was learned from them.
  3. Seminar sessions: what was learned during them, what was the level of participation, was there enough discussion etc.
  4. General organisation of the seminar, i.e. choice of articles, schedule, ...
  5. Conclusions: was it worth taking this seminar? Why? Why not?

Recommended length is about 5 pages. Send seminar report to Kary Främling .

 
This page is maintained by Kary Främling.
Last updated on April 23rd, 2004