An implementation from the state-of-the-art family of reinforcement learning algorithms Proximal Policy Optimization using normalized Generalized Advantage Estimation and optional batch mode training. The loss function incorporates entropy.
The code contains a lot of comments and can be helpful to understand both PPO and PyTorch.
-
Clone the repository to get the files locally on your computer (see https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository,
Cloning an Existing Repository
) -
Navigate into the root folder of the project:
/ppo
-
Download necessary dependencies. These dependencies can be found in the file
requirements.txt
. Use your favorite package manager/installer to install the requirements, we recommend using pip. To install the requirements, run the following command in the root folder of the project (whererequirements.txt
is located):pip install -r requirements.txt
-
All you need is an instance of the
Environment
class (see source code for specification), two are already provided. You also need aLearner
object. See the example inmain.py
.