Fast and lightweight read aligner (experimental)
Nimbliner uses Bloom filters instead of suffix arrays as reference which incurs the cost close to n
in the size of reference sequence (instead of 2-4n
for suffix arrays or BWT). It also does not need to perform a full alignment shaving off a lot of the computational cost. Nimbliner does not yet produce cigar strings, but there is no reason why it would not be able to.
You can build a docker image and then run nimbliner within the container:
<clone the repo>
docker build -t nimbliner-dev:0.1 -f docker/Dockerfile docker/
# this will produce all_kmers.txt and anchors.txt in the current directory
docker run -v `pwd`:/nimbliner nimbliner-dev:0.1 indexer 20 <path to your reference, single fasta file>
docker run -v `pwd`:/nimbliner nimbliner-dev:0.1 mapper 20 <path to your reads, single fasta file> all_kmers.txt anchors.txt
Benchmarking is set up with Snakemake. To run the benchmarks, you can do:
cd benchmarking
snakemake --configfile experiments.json run_pipeline
You can compile from source. The dependencies are liffb and TCLAP. You may need to set LD_LIBRARY_PATH
(or DYLD_LIBRARY_PATH
for MacOS) to /usr/local/lib
since libbf
installs there by default.
You can generate synthetic reads w/ mismatches and indels. For example, to sample a million reads from the chromosome w/ 1.5% error rate, do:
docker run -v `pwd`:/nimbliner nimbliner-dev:0.1 sample 20 1000000 chromo.fa 1.5 > sampled_reads.fa
- provide benchmark data
- prepare indices for the whole human genome
- integrate with TravisCI
- DALIGNER
- STAR
- BWA
- RapMap (speed-only, RapMap does not generate full alignments)