Automation of RNA-seq Workflow
To get started, you must first clone the RNAseq automation repository. Navigate to a directory you would like to clone the repo to and enter git clone https://github.com/rpolicastro/RNAseq.git
.
This workflow takes advantage of the conda package manager and virtual environment. The conda package manager installs both the main software and all dependencies into a 'virtual environment' to ensure compatabilty. Furthermore, the provided 'environment.yml' file will reproduce the software environment used when developing the workflow. This ensures prolonged compatabilty and reproducibility.
Before creating the environment, you must first install miniconda.
- Install miniconda, and make sure that conda is in your PATH.
- Update conda to the latest version
conda update conda
.
You are now read to create the virtual sofware environment, and download all software and dependencies.
If you would like to recreate the environment used when writing the original workflow, you can do so with the provided environment.yml file. This will install the main software versions used in the development of the workflow, and automatically download the dependencies for those version.
conda env create -f environment.yml -p ~/miniconda3/envs/rnaseq-automation
-p
should point to the environments folder for your conda installation, which is usually /miniconda3/envs
.
If you would like to create your own environment with the latest software versions, follow the steps below.
- Create the new environment and specify the software to include in it.
conda create -n rnaseq-automation -y -c conda-forge -c bioconda \
pandas fastqc star samtools subread
- Update the software to the latest compatible versions.
conda update -n rnaseq-automation -y -c conda-forge -c bioconda --all
If you wish to use any of the software in the environment outside of the workflow you can type conda activate rnaseq-automation
. You can deactivate the environment by closing your terminal or entering conda deactivate
.
In order to keep track of samples, this workflow requires the creation of a sample sheet. An example sheet samples.tsv is provided in the examples directory. It is important to follow exact formatting of this sheet, as the information within it is used in various stages of the workflow.
Column | Description |
---|---|
sample_ID | Short sample identifier (e.g. A001). |
condition | Experimental condition (e.g. EWSR1_KD). |
replicate | Sample replicate number (e.g. 1). |
R1 | Name of R1 fastq file of experimental condition. |
R2 | Name of R2 fastq file of experimental condition (put NA if single end). |
paired | Put 'paired' or 'unpaired' depending on the run. |
After getting the conda environment ready and the sample sheet prepared, you are ready to run the workflow.
This workflow would not be possible without the great software listed below.