Solving OpenAI Pendulum-v0 using Proximal Policy Optimization Algorithms
Run this command to use the pretrained model to play the game
>python pendulum.py play
Or run this command to train the model
>python pendulum.py anything-(not-play)
The model in pendulum.py
was able to solved Pendulum-v0
after about 110 episodes
Total rewards in 140 steps of traing:
You're free to edit the model hyperparameters and some constansts to make it better
Special thanks to Morvan Zhou for the explanation of the PPO