bito
, or "Bayesian Inference of Trees via Optimization", is a Python-interface C++ library for phylogenetic variational inference so that you can express interesting parts of your phylogenetic model in Python/TensorFlow/PyTorch/etc and let bito handle the tree structure and likelihood computations for you.
"Bito" is also the name of a tree native to Africa that produces medicinal oil.
We pronounce "bito" with a long /e/ sound ("bito" rhymes with "burrito").
This library is in an experimental state. This library was formerly known as "libsbn".
- If you are on linux, install gcc >= 7.5, which is standard in Debian Buster and Ubuntu 18.04
- If you are on OS X, use a recent version of Xcode and install command line tools
We suggest using anaconda and the associated conda environment file, which will nicely install relevant dependencies:
conda env create -f environment.yml
conda activate bito
(Very optional) The notebooks require R, IRKernel, rpy2 >=3.1.0, and some R packages such as ggplot and cowplot.
For your first build, do
git submodule update --init --recursive
make
This will install the bito
Python module.
You can build and run tests using make test
and make fasttest
(the latter excludes some slow tests).
Note that make
accepts -j
flags for multi-core builds: e.g. -j20
will build with 20 jobs.
- (Optional) If you modify the lexer and parser, call
make bison
. This assumes that you have installed Bison >= 3.4 (conda install -c conda-forge bison
). - (Optional) If you modify the test preparation scripts, call
make prep
. This assumes that you have installed ete3 (conda install -c etetoolkit ete3
).
The following two papers will explain what this repository is about:
- Zhang & Matsen IV, NeurIPS 2018. Generalizing Tree Probability Estimation via Bayesian Networks; 👉🏽 blog post.
- Zhang & Matsen IV, ICLR 2019. Variational Bayesian Phylogenetic Inference; 👉🏽 blog post.
Our documentation consists of:
- Online documentation
- Derivations in
doc/tex
, which explain what's going on in the code.
We welcome your contributions! Please see our detailed contribution guidelines.
- Erick Matsen (@matsen): implementation, design, janitorial duties
- Dave H. Rich (@DaveRich): core developer
- Ognian Milanov (@ognian-): core developer
- Mathieu Fourment (@4ment): implementation of substitution models and likelihoods/gradients, design
- Seong-Hwan Jun (@junseonghwan): generalized pruning design and implementation, implementation of SBN gradients, design
- Hassan Nasif (@hrnasif): hot start for generalized pruning; gradient descent for generalized pruning
- Anna Kooperberg (@annakooperberg): refactoring the subsplit DAG
- Sho Kiami (@shokiami): refactoring the subsplit DAG
- Tanvi Ganapathy (@tanviganapathy): refactoring the subsplit DAG
- Lucy Yang (@lucyyang01): subsplit DAG visualization
- Cheng Zhang (@zcrabbit): concept, design, algorithms
- Christiaan Swanepoel (@christiaanjs): design
- Xiang Ji (@xji3): gradient expertise and node height code
- Marc Suchard (@msuchard): gradient expertise and node height code
- Michael Karcher (@mdkarcher): SBN expertise
- Eric J. Isaac (@EricJIsaac): C++ wisdom
If you are citing this library, please cite the NeurIPS and ICLR papers listed above. We require BEAGLE, so please also cite these papers:
- Jaime Huerta-Cepas: several tree traversal functions are copied from ete3
- Thomas Junier: parts of the parser are copied from newick_utils
- The parser driver is derived from the Bison C++ example
In addition to the packages mentioned above we also employ:
- cxx-prettyprint STL container pretty printing
- Eigen
- fast-cpp-csv-parser
- Progress-CPP progress bar