CS-500: Multiagent Reinforcement Learning
Spring 2007
Time: Fridays 2pm - 3:30pm
Place: CoRE B (CoRE 305)
Description
We'll be meeting to read and discuss papers on multiagent reinf orcement learnin g (M ARL). Specifically , we'll take a single agent perspective on what policy i t sho uld learn when the envir omnent it interacts with is composed by many other learning agents. Assuming self-interested agents, and focusing on self play w e will focus on three types of papers . The first t y pe deal with agents that explicitly learn equilibria , the sec ond deal with a gents that learn a best response to the joint action s o f its ad versaries (teamma tes) an d the third learn policies that sa tisfy multiple ot h er criteria (not just equ ilibria or best response).
Schedule
1/26/07: M.
Littman,
1994
2/02/07: Hu
and Wellman, 1998, M. Littman, 2001
2/09/07: Shoham and
Powers, 2003
2/16/07:
Claus
and
Boutilier, 1998, Littman
and Stone, 2001
2/23/07: Michael's talk
3/02/07: Weinberg
and Rosenschein, 2004
3/09/07: Zinkevich
et al., 2005 (presented by Pavel)
3/16/07: **SPRING BREAK**
3/23/07: Bowling
and
Veloso, 2001 (presented by John)
3/30/07: Banerjee
and Peng, 2003 (presented by Rhonda)
4/06/07: Powers
and Shoham, 2004 (presented by Chris)
4/13/07: Crandall
and Goodrich, 2005 (presented by Mangesh)
4/20/07: Munoz de Cote et al.,2006(presented by Robert) && Powers
et al., 2006 (presented by Monica)
4/27/07: Greenwald
et al., (presented by Ali) && Bowling,
2004 (presented by Bert)
5/04/07: Greenwald
et
al., 2002 (presented by Marwan)
Paper Bank
1. Equilibrium learners
(a)
M.
Littman,
1994
(b)
Hu
and Wellman, 1998
(c)
M. Littman, 2001
(d) Greenwald
et
al., 2002
2. Best response learners
(a) [Uther and Veloso, 2003],
Claus
and
Boutilier, 1998
(b) Littman and Stone, 2001 (this is built on an asymmetric setting -non self play)
(c) Weinberg
and Rosenschein, 2004
(d) Zinkevich
et al., 2005
3. Multiple criteria learners (security and convergence)
(a) Bowling
and
Veloso, 2001, [Veloso and Bowling, 2002], Banerjee
and Peng, 2003
(b) Bowling,
2004
(c) Shoham and
Powers, 2003
(d) Crandall
and Goodrich, 2005
(e) Munoz de Cote et al.,2006, Powers
et al., 2006
Please contact Enrique Munoz de Cote (jemc AT ecs.soton.ac.uk) with any questions