Skip to content

gzentner/RNAseq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAseq

Automation of RNA-seq Workflow

Getting Started

Cloning Repository

To get started, you must first clone the RNAseq automation repository. Navigate to a directory you would like to clone the repo to and enter git clone https://github.com/rpolicastro/RNAseq.git.

Preparing Conda Environment

This workflow takes advantage of the conda package manager and virtual environment. The conda package manager installs both the main software and all dependencies into a 'virtual environment' to ensure compatabilty. Furthermore, the provided 'environment.yml' file will reproduce the software environment used when developing the workflow. This ensures prolonged compatabilty and reproducibility.

Before creating the environment, you must first install miniconda.

  1. Install miniconda, and make sure that conda is in your PATH.
  2. Update conda to the latest version conda update conda.

You are now read to create the virtual sofware environment, and download all software and dependencies.

Recreating Original Environment

If you would like to recreate the environment used when writing the original workflow, you can do so with the provided environment.yml file. This will install the main software versions used in the development of the workflow, and automatically download the dependencies for those version.

conda env create -f environment.yml -p ~/miniconda3/envs/rnaseq-automation

-p should point to the environments folder for your conda installation, which is usually /miniconda3/envs.

Creating Updated Environment

If you would like to create your own environment with the latest software versions, follow the steps below.

  1. Create the new environment and specify the software to include in it.
conda create -n rnaseq-automation -y -c conda-forge -c bioconda \
pandas fastqc star samtools subread
  1. Update the software to the latest compatible versions.
conda update -n rnaseq-automation -y -c conda-forge -c bioconda --all

If you wish to use any of the software in the environment outside of the workflow you can type conda activate rnaseq-automation. You can deactivate the environment by closing your terminal or entering conda deactivate.

Creating Sample Sheet

In order to keep track of samples, this workflow requires the creation of a sample sheet. An example sheet samples.tsv is provided in the examples directory. It is important to follow exact formatting of this sheet, as the information within it is used in various stages of the workflow.

Column Description
sample_ID Short sample identifier (e.g. A001).
condition Experimental condition (e.g. EWSR1_KD).
replicate Sample replicate number (e.g. 1).
R1 Name of R1 fastq file of experimental condition.
R2 Name of R2 fastq file of experimental condition (put NA if single end).
paired Put 'paired' or 'unpaired' depending on the run.

Running the Workflow

After getting the conda environment ready and the sample sheet prepared, you are ready to run the workflow.

Built With

This workflow would not be possible without the great software listed below.

  • FastQC - Read quality control.
  • STAR - Read aligner.
  • Samtools - SAM/BAM manipulation.
  • Subread - Read annotation and counting.
  • Pandas - Dataframe manipulation.

About

Standard RNA-seq Workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%