Distributed GPU Monte Carlo Tree Search

TL;DR

Solves checkers/reversi (why didn't I do GO?)

Includes self-play

Runs on GPU (CUDA) or CPU (C++)

Over 1 million game simulations/second on a single 8 year old 280 GTX GPU.

GPU version is quite efficient actually ( here is the score advantage plotted on the y-axis vs time, playing againt a single-core CPU)

Scales up nicely using MPI (Message Passing Interface) to a large distributed system (tested on a 2048-node supercomputer, up to 3.5M GPU threads)

Has a very minimal ssh-friendly interface

I used this code while working on my PhD thesis. The MPI version has been tested on the Japanese TSUBAME supercomputer.

THESIS + BIBTEX

Thesis Slides

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
out		out
parser		parser
src		src
Makefile		Makefile
README.md		README.md
hosts		hosts
job.sh		job.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed GPU Monte Carlo Tree Search

TL;DR

Solves checkers/reversi (why didn't I do GO?)

Includes self-play

Runs on GPU (CUDA) or CPU (C++)

Over 1 million game simulations/second on a single 8 year old 280 GTX GPU.

GPU version is quite efficient actually ( here is the score advantage plotted on the y-axis vs time, playing againt a single-core CPU)

Scales up nicely using MPI (Message Passing Interface) to a large distributed system (tested on a 2048-node supercomputer, up to 3.5M GPU threads)

Has a very minimal ssh-friendly interface

I used this code while working on my PhD thesis. The MPI version has been tested on the Japanese TSUBAME supercomputer.

THESIS + BIBTEX

About

Releases

Packages

Languages

krocki/mcts_mpi

Folders and files

Latest commit

History

Repository files navigation

Distributed GPU Monte Carlo Tree Search

TL;DR

Solves checkers/reversi (why didn't I do GO?)

Includes self-play

Runs on GPU (CUDA) or CPU (C++)

Over 1 million game simulations/second on a single 8 year old 280 GTX GPU.

GPU version is quite efficient actually ( here is the score advantage plotted on the y-axis vs time, playing againt a single-core CPU)

Scales up nicely using MPI (Message Passing Interface) to a large distributed system (tested on a 2048-node supercomputer, up to 3.5M GPU threads)

Has a very minimal ssh-friendly interface

I used this code while working on my PhD thesis. The MPI version has been tested on the Japanese TSUBAME supercomputer.

THESIS + BIBTEX

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages