To install via conda, it is available on the matsengrp
channel:
conda install -c matsengrp larch-phylo
Currently only available in Linux.
- GCC 7.5
- cmake 3.16
For Ubuntu 18.04 LTS the following commands installs the requirements:
sudo apt install --no-install-recommends git git-lfs cmake make g++ mpi-default-dev libprotobuf-dev libboost-dev libboost-program-options-dev libboost-filesystem-dev libboost-iostreams-dev libboost-date-time-dev protobuf-compiler automake autoconf libtool nasm
To get a recent cmake, download from https://cmake.org/download/
, for example:
wget https://github.com/Kitware/CMake/releases/download/v3.23.1/cmake-3.23.1-linux-x86_64.tar.gz
- singularity 3.5.3
- conda 22.9.0
Larch can be built utilizing a Singularity container or a Conda environment.
To build Singularity image, use the definition provided:
singularity build larch-singularity.sif larch-singularity.def
singularity shell larch-singularity.sif --net
To setup a conda environment capable of building Larch, create larch
using the standard environment file provided:
conda env create -f environment.yml
To setup a conda environment capable of building Larch including development tools, create larch-dev
using the development environment file provided:
conda env create -f environment-dev.yml
There are 4 executables that are built automatically as part of the larch package and provide various methods for exploring tree space and manipulating DAGs/trees:
larch-test
is the suite of tests used to validate the various routines.larch-usher
is a tool that takes an input tree/DAG and explores tree space through SPR moves.larch-dagutil
is a utility that manipulates (e.g. merge, prune) or inspects DAGs/trees.larch-dag2dot
is a utility that writes a DAG to a DOT file format for easier viewing.
Note: If you run against memory limitations during the cmake step, you can regulate number of parallel threads with export CMAKE_NUM_THREADS="8"
(reduce number as necessary).
To build all from larch/
directory, run:
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j16
# optionally, to install outside of build directory
make install
Cmake build options:
- add
-DMAKE_BUILD_TYPE=Debug
to build in debug mode.-DMAKE_BUILD_TYPE=Release
is enabled by default. - add
-DCMAKE_CXX_CLANG_TIDY="clang-tidy"
to enable clang-tidy. - add
-DUSE_ASAN=yes
to enable asan and ubsan. - add
-DCMAKE_INSTALL_PREFIX=path/to/install
to select install location. By default, this will perform a system-wide installation. To install in current conda environment, use-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
.
For all tools in this suite, a number of file formats are supported for loading and storing MATs and MADAGs. When passing filepaths as arguments, the file format can be explicitly specified with --input-format/--output-format
options. Alternatively, the program can infer the file format when filepath contains a recognized file extension.
File format options:
MADAG dagbin
Supported as input and output.*.dagbin
is the recognized extension.MADAG protobuf
Supported as input and output.*.pb_dag
is the recognized extension, or using*.pb
WITHOUT a--MAT-refseq-file
option.MAT protobuf
Supported as input only.*.pb_tree
is the recognized extension, or using*.pb
WITH a--MAT-refseq-file
option.MADAG json
Supported as input only.*.json_dag
or*.json
is the recognized extension.
From the larch/build/bin
directory:
ln -s ../../data
./larch-test
Passing nocatch to the tests executable will allow exceptions to escape, which is useful for debugging. A gdb session can be started with gdb --args build/larch-test nocatch
.
larch-test options:
nocatch
allows test exceptions to escape, which is useful for debugging. A gdb session can be started withgdb --args build/larch-test nocatch
.--list
produces a list of all available tests, along with an ID number.--range
runs tests by ID with a string of comma-separated range or single ID arguments [e.g. 1-5,7,9,12-13].-tag
excludes tests with a given tag.+tag
includes tests with a given tag.- For example, the
-tag "slow"
removes tests which require an long runtime to complete.
From the larch/build/bin
directory:
./larch-usher -i ../data/testcase/tree_1.pb.gz -o output_dag.pb -c 10
This command runs 10 iterations of larch-usher on the provided tree, and writes the final result to the file output_dag.pb
larch-usher options:
-i,--input
[REQUIRED] Filepath to the input tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output
[REQUIRED] Filepath to the output tree/DAG (accepted file formats are: MADAG protobuf, Dagbin).-c,--count
[Default: 1] Number of larch-usher iterations to run.-r,--MAT-refseq-file
[REQUIRED if provided input file is a MAT protobuf] Filepath to json reference sequence.-v,--VCF-input-file
Filepath to VCF containing ambiguous sequence data.-l,--logpath
[Default:optimization_log
] Filepath to write summary log.-s,--switch-subtrees
[Default: never] Switch to optimizing subtrees after the specified number of iterations.--min-subtree-clade-size
[Default: 100] The minimum number of leaves in a subtree sampled for optimization (ignored without option-s
).--max-subtree-clade-size
[Default: 1000] The maximum number of leaves in a subtree sampled for optimization (ignored without option-s
).--move-coeff-nodes
[Default: 1] New node coefficient for scoring moves. Set to 0 to apply only parsimony-optimal SPR moves.--move-coeff-pscore
[Default: 1] Parsimony score coefficient for scoring moves. Set to 0 to apply only topologically novel SPR moves.--sample-method
[Default:parsimony
] Select method for sampling optimization tree from the DAG. Options are: (parsimony
,random
,rf-minsum
,rf-maxsum
).--sample-uniformly
[Default: use natural distribution] Use a uniform distribution to sample trees for optimization.- For example, if the sampling method is
parsimony
and--sample-uniformly
is provided, then a uniform distribution on parsimony-optimal trees is sampled from. --callback-option
[Default:best-moves
] Specify which SPR moves are chosen and applied. Options are: (all-moves
,best-moves-fixed-tree
,best-moves-treebased
,best-moves
).--trim
[Default: do not trim] Trim optimized dag to contain only parsimony-optimal trees before writing to protobuf.--keep-fragment-uncollapsed
[Default: collapse] Do not collapse empty (non-mutation-bearing) edges in the optimization tree.--quiet
[Default: write intermediate files] Do not write intermediate protobuf file at each iteration.--input-format
[Default: format inferred by file extension] Specify the format of the input file. Options are: (dagbin
,pb
,dag-pb
,tree-pb
,json
,dag-json
)--output-format
[Default: format inferred by file extension] Specify the format of the output file. Options are: (dagbin
,pb
,dag-pb
)-S
Enable smart stopping: larch-usher will terminate when parsimony improvement ceases to occur.-T
specify a hard time limit after which larch-usher will terminate.
From the larch/build/bin
directory:
./larch-dagutil -i ../data/testcase/tree_1.pb.gz -i ../data/testcase/tree_2.pb.gz -o merged_trees.pb
This executable takes a list of protobuf files and merges the resulting DAGs together into one.
dag-util options:
-i,--input
Filepath to the input Tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output
[Default: does not print output] Filepath to the output Tree/DAG (accepted file formats are: MADAG protobuf, Dagbin).-r,--MAT-refseq-file
[REQUIRED if input protobufs are MAT protobuf format] Filepath to json reference sequence.-t,--trim
Trim output (Default trimming method is trim to best parsimony).--rf
Trim output to minimize RF distance to the provided DAG file (Ignored if-t
flag is not provided).-s,--sample
Write a sampled single tree from DAG to file, rather than the whole DAG.--dag-info
Print stats about the DAG (tree count, all parsimony scores, all RF distances)--parsimony
Print all parsimony scores.--sum-rf-distance
Print all sum RF distances.--input-format
[Default: format inferred by file extension] Specify the format of the input file(s). Options are: (dagbin
,pb
,dag-pb
,tree-pb
,json
,dag-json
)--output-format
[Default: format inferred by file extension] Specify the format of the output file. Options are: (dagbin
,pb
,dag-pb
)--rf-format
[Default: format inferred by file extension] Specify the format of the RF file. Options are: (dagbin
,pb
,dag-pb
,tree-pb
,json
,dag-json
)
From the larch/build/bin
directory:
./larch-dag2dot -i ../data/testcase/full_dag.pb
This command writes the provided DAG in dot format to stdout.
dag2dot options:
-i,--input
Filepath to the input Tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output
[Default: DOT written to stdout] Filepath to the output DOT file.--input-format
[Default: format inferred by file extension] Specify the format of the input file. Options are: (dagbin
,pb
,dag-pb
,tree-pb
,json
,dag-json
)--dag/--tree
[REQUIRED if file extension is *.pb] Specify whether input file is a DAG or a Tree.
- Lohmann, N. (2022). JSON for Modern C++ (Version 3.10.5) [Computer software]. https://github.com/nlohmann
- Eric Niebler. Range library for C++14/17/20. https://github.com/ericniebler/range-v3