This repository contains a fast C++ multithreaded implementation of the asynchronous advantage actor-critic (A3C) algorithm based on the GA3C architecture. I tested it on the CartPole-v0
OpenAI Gym environment using TensorFlow and the gym-uds-api, taking model configuration and parameters from this other implementation.
This project requires any recent version of Google's TensorFlow. The steps for building it from sources are well explained in the official documentation; as a quick recap, you are required to:
- Clone TensorFlow from the official repository
- Switch to a stable release using e.g.
git checkout r1.3
- Run
- Build the shared library for C++ using
bazel build --config=opt //
- Copy
to any directory searched by the run-time loader
Do not delete the TensorFlow repository before running the installation steps below. Lastly, use pip to install the tensorflow
and gym
packages for Python 3.
- Recursively clone this repository, i.e.
git clone --recursive
- Run
cartpole-v0/ <absolute_path_to_tensorflow_repository>
to copy the necessary C++ headers and sources intocartpole-v0/third-party/
To try the code in an empty environment and on an empty model, just compile GA3C
and run it (no dependencies):
/opt/GA3C-cpp $ make GA3C
mkdir -p bin
clang++ -std=c++11 -O2 -march=native -Wall -pthread -o bin/GA3C -I include \
src/ src/
/opt/GA3C-cpp $ bin/GA3C
Training finished in 1.9976 seconds
To test the code on CartPole-v0
, change directory to cartpole-v0
and generate the TensorFlow model:
/opt/GA3C-cpp $ cd cartpole-v0/
/opt/GA3C-cpp/cartpole-v0 $ python3 generate
Compile the code:
/opt/GA3C-cpp/cartpole-v0 $ make GA3C-cartpole-v0
mkdir -p bin
clang++ -std=c++11 -O2 -march=native -Wall -pthread -ltensorflow_cc -o bin/GA3C-cartpole-v0 -I ../include \
-I third-party -I third-party/gym-uds-api/binding-cpp/include -I third-party/third_party -I include \
third-party/gym-uds-api/binding-cpp/src/ third-party/gym-uds-api/binding-cpp/src/ \
../src/ src/ src/
Start some gym-uds servers and train the agent:
/opt/GA3C-cpp/cartpole-v0 $ ./
/opt/GA3C-cpp/cartpole-v0 $ bin/GA3C-cartpole-v0
[2017-09-01 12:18:38,516] Making new env: CartPole-v0
[2017-09-01 12:18:38,516] Making new env: CartPole-v0
[00] Ep. 00250 reward: 200
[02] Ep. 00260 reward: 200
Training finished in 10.768 seconds
The updated weights of the model are saved back to disk. Test the trained agent using:
/opt/GA3C-cpp/cartpole-v0 $ python3 test
Keep in mind to regenerate the empty model using the Python script before training again.