SPEAQeasy- a Scalable Pipeline for Expression Analysis and Quantification that is easy to install and share
SPEAQeasy is a Scalable RNA-seq Pipeline for Expression Analysis and Quantification based on the RNAseq-pipeline. Built on nextflow, and capable of using Docker containers and utilizing common resource managers (e.g. SLURM), this port of the RNAseq-pipeline can be used in different computer environments. It is described in the manuscript here.
The main function of this pipeline is to produce comparable files to those used in recount2, a tool that provides gene, exon, exon-exon junction and base-pair level data.
This pipeline allows researchers to contribute data to the recount2 project even from outside the JHPCE.
SPEAQeasy takes raw RNA-seq reads and produces analysis-ready R objects, providing a "bridge to the Bioconductor universe", where researchers can utilize the powerful existing set of tools to quickly perform desired analyses.
Beginning with a set of FASTQ files (optionally gzipped), SPEAQeasy ultimately produces RangedSummarizedExperiment
objects to store gene, exon, and exon-exon junction counts for an experiment. Optionally, expressed regions data is generated, enabling easy computation of differentially expressed regions (DERs).
Our vignette demonstrates how genotype calls by SPEAQeasy can be coupled with user-provided genotype and phenotype data to easily resolve identity issues that arise during sequencing. We then walk through an example differential expression analysis and explore data visualization options.
- Automatically merge samples split across multiple FASTQ files, using the
samples.manifest
input - Trivially select any GENCODE annotation release for "hg38", "hg19", or "mm10" references (Ensembl for "rat" reference) and adjust other annotation settings with simple configuration
- Generates a single VCF file for experiments on human reference, which can be used to resolve sample identity issues and salvage problematic samples
- Supports docker to manage software dependencies and is preconfigured for execution locally or on SLURM or SGE clusters
- Multiple users can share a single SPEAQeasy installation with minimal work
- Detailed, user-friendly logging for transparency and identifying potential issues
The SPEAQeasy documentation website describes the pipeline in full detail. For briefly getting started, check out the quick start guide.
Because SPEAQeasy is based on the nextflow workflow manager, it supports execution on computing clusters managed by SLURM or SGE without any configuration (local execution is also possible). Those with access to docker can very simply use docker containers to manage SPEAQeasy software dependencies, though we provide a script for installing dependencies for users without docker or even root privileges.
Original Pipeline
Emily Burke, Leonardo Collado-Tores, Andrew Jaffe, BaDoi Phan
Nextflow Port
Nick Eagles, Brianna Barry, Jacob Leonard, Israel Aguilar, Violeta Larios, Everardo Gutierrez
We hope that SPEAQeasy
will be useful for your research. Please use the following bibtex information to cite the software and overall approach. Thank you!
@article {Eagles2021,
author = {Eagles, Nicholas J. and Burke, Emily E. and Leonard, Jacob and Barry, Brianna K. and Stolz, Joshua M. and Huuki, Louise and Phan, BaDoi N. and Larrios Serrato, Violeta and Guti{\'e}rrez-Mill{\'a}n, Everardo and Aguilar-Ordo{\~n}ez, Israel and Jaffe, Andrew E. and Collado-Torres, Leonardo},
title = {SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses},
year = {2021},
doi = {10.1186/s12859-021-04142-3},
publisher = {Springer Science and Business Media LLC},
URL = {https://doi.org/10.1186/s12859-021-04142-3},
journal = {BMC Bioinformatics}
}