Skip to content

Latest commit

 

History

History
77 lines (51 loc) · 8.16 KB

README.md

File metadata and controls

77 lines (51 loc) · 8.16 KB

Awesome Meta Reinforcement Learning Awesome

Table of Contents

Papers

Model-Free Meta Reinforcement Learning

  • Learning to Reinforcement Learn (2016) Jane X Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munois, C Blundell, D Kumaran, M Botvinick. [arXiv] (recurrent meta-RL)
  • RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning (2016) Yan Duan, John Schulman, Xi Chen, Peter Bartlett. [arXiv] Algorithm: RL^2.
  • A Simple Neural Attentive Meta-Learner (2017) Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel. [arXiv] Algorithm: SNAIL. (soft attention)
  • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017) Chelsea Finn, Pieter Abbeel, Sergey Levine. [arXiv] [GitHub] Algorithm: MAML. (gradient-based meta-RL)
  • ProMP: Proximal Meta-Policy Search (2018) Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamin Asfouir, Pieter Abbeel. [arXiv] [GitHub] Algorithm: ProMP. (gradient-based meta-RL)
  • Meta-Learning Structured Exploration Strategies (2018) Abhishek Gupta, Russell Mendonca, Yuxuan Liu, Pieter Abbeel, Sergey Levine. [arXiv] [GitHub] Algorithm: MAESN. (gradient-based meta-RL, exploration with latent variables)
  • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (2019) Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine. [arXiv] [GitHub] Algorithm: PEARL. (off-policy meta-RL with posterior sampling for exploration)
  • VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning (2019) Luisa Zintgraf, Kyriacos Shialis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson. [arXiv] [GitHub] Algorithm: variBAD. (PEARL + update the latent state every timestep)
  • Generalizing Skills with Semi-Supervised Reinforcement Learning (2017) Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine. [arXiv] [GitHub]
  • Learning Latent Plans from Play (2019) Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Janathan Tompson, Sergey Levine, Pierre Sermanet. [arXiv]
  • Deep Variational Reinforcement Learning for POMDPs (2018) Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson. [arXiv] [GitHub] Algorithm: DVRL. (variational inference for POMDPs)
  • Some Considerations on Learning to Explore with Meta-RL (2018) Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever. [arXiv] [GitHub] Algorithm: E-MAML & E-RL2. (treat the adaptation step as part of the unknown dynamics of environment)
  • Learning to Explore via Meta-Policy Gradient (2018) Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng. [arXiv] (learn the exploration policy in single task algorithms such as DDPG)
  • Guided Meta-Policy Search (2019) Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn. [arXiv] [GitHub]
  • End-to-End Robotic Reinforcement Learning without Reward Engineering (2019) Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine. [arXiv] [GitHub]
  • Task-Agnostic Dynamics Priors for Deep Reinforcement Learning (2019) Yilun Du, Karthik Narasimhan. [arXiv] [GitHub] Algorithm: SpatialNet.
  • Meta Reinforcement Learning with Task Embedding and Shared Policy(2019) Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang. [arXiv]
  • Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning (2019) Brian Gaudet, Richard Linares, Roberto Furfaro. [arXiv]
  • Learning Latent State Representation for Speeding Up Exploration (2019) Giulia Vezzani, Abhishek Gupta, Lorenzo Natale, Pieter Abbeel. [arXiv]
  • Beyond Exponentially Discounted Sum: Automatic Learning of Return Function (2019) Yufei Wang, Qiwei Ye, Tie-Yan Liu. [arXiv]
  • Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy (2019) Ruihan Yang, Qiwei Ye, Tie-Yan Liu. [arXiv]
  • NoRML: No-Reward Meta Learning (2019) Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Jie Tan, Chelsea Finn. [arXiv] [GitHub] Algorithm: NoRML. (MAML + environment dynamics)

Model-Based Meta Reinforcement Learning

  • Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning (2018) Anuesha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn. [arXiv] [GitHub]
  • Few-Shot Goal Inference for Visuomotor Learning and Planning (2018) Annie Xie, Avi Singh, Sergey Levine, Chelsea Finn. [arXiv] [GitHub]

Meta Imitation Learning

  • One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL (2018) Tom Le Paine, Sergio Gomez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas. [arXiv] Algorithm: MetaMimic.
  • Watch, Try, Learn: Meta-Learning from Demonstrations and Reward (2019) Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn. [arXiv] [GitHub] (demonstrateion + trial-and-error)

Unsupervised Meta Reinforcement Learning

  • Unsupervised Meta-Learning for Reinforcement Learning (2018) Abhishek Gupta, Benjamin Eysenbach, Chelseas Finn, Sergey Levine. [arXiv]
  • Skew-Fit: State-Covering Self-Supervised Reinforcement Learning (2019) Vitchyr H. Pong, Murtaza Dalal, Steven Lin, Ascvin Nair, Shikhar Bahl, Sergey Levine. [arXiv] Algorithm: Skew-Fit. (maximize entropy)

Meta Lifelong Reinforcement Learning

  • Gradient Episodic Memory for Continual Learning (2017) David Lopez-Paz, Marc Aurelio Ranzato. [arXiv]
  • Deep Online Learning via Meta-Learning (2019) Nagabandi, Finn, Levine. [arXiv]

Lectures

Blogs

Datasets

Contributions to this repo are welcome.