Introduction¶
Table of Contents
What This Is¶
Welcome to Spinning Up in Deep RL for mobile robotics! This is an educational resource inspired by the Openai spinning up in Deep RL
For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning.
This module contains a variety of helpful resources, including:
- a short introduction to RL terminology, kinds of algorithms, and basic theory,
- an essay about how to grow into an RL research role,
- a curated list of important papers organized by topic,
- a well-documented code repo of short, standalone implementations of key algorithms,
- and a few exercises to serve as warm-ups.
Why it matters¶
Let us consider the case of a student learning deep RL. He finds excellent course content available from such as CS294 from Berkley by Sergey Levine and other resources such as spinning up in Deep RL by openai offering great insights. In addition, open source environments such as the Openai gym are available for evaluation of state the art algorithms in simulation on a variety of standard environments. The intuitive interface makes it easy to experiment with state of the art algorithms and compare them with standardbenchmarks. Having exhausted all course content available online, the student wants to do research in deep RL for mobile robot navigation because to enables end-to-end trainable models which integrate perception,planning and control. Conventional methods use separate modules for perception, planning and control and merging them successfully requires years of domain experience. However, he could not findany quality course content or insightful blog posts on the subject. The optimistic student expects thatthe same state of the algorithms which had worked in simulation for simple control tasks but unfor-tunately, no standard environments are available like the Openai gym. The few resources availablerequired domain expertise and are difficult to customize. Moreover, few published papers/literatureon the subject have open sourced the code and discussed training setups in detail. Another equallyimportant source of concern is that real world
evaluations are performed on state of the art mobilerobot platforms costing thousands of dollars. This paper addresses the above points, which we inferhas severely limited widespread understanding of the limits and potentials of reinforcement learningin mobile robotics.
How This Serves Our Mission¶
OpenAI’s mission is to ensure the safe development of AGI and the broad distribution of benefits from AI more generally. Teaching tools like Spinning Up help us make progress on both of these objectives.
To begin with, we move closer to broad distribution of benefits any time we help people understand what AI is and how it works. This empowers people to think critically about the many issues we anticipate will arise as AI becomes more sophisticated and important in our lives.
Also, critically, we need people to help us work on making sure that AGI is safe. This requires a skill set which is currently in short supply because of how new the field is. We know that many people are interested in helping us, but don’t know how—here is what you should study! If you can become an expert on this material, you can make a difference on AI safety.
Code Design Philosophy¶
The algorithm implementations in the Spinning Up repo are designed to be
- as simple as possible while still being reasonably good,
- and highly-consistent with each other to expose fundamental similarities between algorithms.
They are almost completely self-contained, with virtually no common code shared between them (except for logging, saving, loading, and MPI utilities), so that an interested person can study each algorithm separately without having to dig through an endless chain of dependencies to see how something is done. The implementations are patterned so that they come as close to pseudocode as possible, to minimize the gap between theory and code.
Importantly, they’re all structured similarly, so if you clearly understand one, jumping into the next is painless.
We tried to minimize the number of tricks used in each algorithm’s implementation, and minimize the differences between otherwise-similar algorithms. To give some examples of removed tricks: we omit regularization terms present in the original Soft-Actor Critic code, as well as observation normalization from all algorithms. For an example of where we’ve removed differences between algorithms: our implementations of DDPG, TD3, and SAC all follow a convention laid out in the original TD3 code, where all gradient descent updates are performed at the ends of episodes (instead of happening all throughout the episode).
All algorithms are “reasonably good” in the sense that they achieve roughly the intended performance, but don’t necessarily match the best reported results in the literature on every task. Consequently, be careful if using any of these implementations for scientific benchmarking comparisons. Details on each implementation’s specific performance level can be found on our benchmarks page.
Support Plan¶
We plan to support Spinning Up to ensure that it serves as a helpful resource for learning about deep reinforcement learning. The exact nature of long-term (multi-year) support for Spinning Up is yet to be determined, but in the short run, we commit to:
High-bandwidth support for the first three weeks after release (Nov 8, 2018 to Nov 29, 2018).
- We’ll move quickly on bug-fixes, question-answering, and modifications to the docs to clear up ambiguities.
- We’ll work hard to streamline the user experience, in order to make it as easy as possible to self-study with Spinning Up.
Approximately six months after release (in April 2019), we’ll do a serious review of the state of the package based on feedback we receive from the community, and announce any plans for future modification, including a long-term roadmap.
Additionally, as discussed in the blog post, we are using Spinning Up in the curriculum for our upcoming cohorts of Scholars and Fellows. Any changes and updates we make for their benefit will immediately become public as well.