Skip to content

varshini712/snsxt

 
 

Repository files navigation

Build Status Documentation Status

snsxt

Extension to the sns pipeline

Overview

This program is meant to be an extension to the sns wes pipeline for bioinformatic analysis of whole/target exome sequencing data.

snsxt is a BYOC framework (Bring Your Own Code) for running downstream analysis tasks on sns-wes pipeline output.

Use this framework to run any extra analysis tasks you like after an sns pipeline analysis has finished.

Usage

  • Create a new directory for your analysis
mkdir /path/to/analysis
cd /path/to/analysis
  • Clone this repository and navigate to its directory
git clone --recursive https://github.com/NYU-Molecular-Pathology/snsxt.git
cd snsxt
  • Run the run.py script
snsxt/run.py -d .

Arguments

Required

  • -d, --analysis_dir: Path to the to use for the analysis. For a new sns analysis, this will become the output directory. For an existing sns analysis output, this will become the input directory

Optional

  • -f, --fastq_dir: Directories containing .fastq.gz files (required for a new sns analysis)

  • -a, --analysis_id: An identifier for the analysis (e.g. NextSeq run ID)

  • -r, --results_id: A sub-identifier for the analysis (e.g. a timestamp)

  • -t, --task-list: A YAML formatted list of downstream analysis tasks for snsxt, defaults to task_lists/default.yml

  • --targets: A .bed file with genomic regions for the analysis, defaults to the included targets.bed file

  • --probes: Probes .bed file with regions for CNV analysis, defaults to the included probes.bed file

  • --pairs_sheet: "samples.pairs.csv" samplesheet to use for paired analysis

Program Components

Names and locations of these items may change with development

Starting at the parent snsxt (this repo's parent dir):

  • snsxt: main directory containing all code for the program

  • snsxt/config: configuration module for the main program

  • snsxt/fixtures: dummy analysis output files and directories for unit testing

  • snsxt/logs/: default program log output directory

  • snsxt/sns_classes: submodule with Python classes for interacting with sns pipeline output

  • snsxt/sns_tasks: submodule containing additional analysis tasks to be performed in the program

  • snsxt/util: submodule with utility functions and classes for usage in the program

  • snsxt/report: directory containing files and configuration for the parent analysis report

  • snsxt/logging.yml: configurations for program logging

  • snsxt/test.py: script to run all unit tests in the program and its submodules

  • snsxt/run.py: main script used to run the program

Analysis Tasks

The sns_tasks submodule contains code for the various analysis tasks to be run in the program, which are derived from the AnalysisTask custom class. Examples of other analysis task classes can be seen here and here, and a class template has also been included. Task classes must be imported into the sns_tasks/__init__.py file in order to be made accessible to the rest of the program.

Task Types

Tasks can come in a few flavors:

  • tasks that operate on the entire analysis at once

  • tasks that operate on a single sample at a time

Additionally, tasks can be run a few different ways:

  • run in the current program session

  • submitted as a compute job to the HPC cluster with qsub

Each combination of task type and run type utilizes a separate 'run' function, which should be wrapped by the task's run() method.

Task Lists

The snsxt program uses a YAML formatted 'task list' file in order to determine which tasks should be run, and in what order. By default, the task_lists/default.yml file is used. Tasks names listed should correspond to the name of the Python class for each analysis task, and extra parameters to be passed to the task's run() function can be included.

Adding New Tasks

You can add new analysis task modules to snsxt by following this workflow:

  • enter the sns_tasks subdirectory and make a copy of the :
cd snsxt/sns_tasks
cp _template.py _MyNewTask.py
  • edit the new task's custom Python class following the template shown, putting the main logic to run the task in the class's main() method, and setting the run() method as a wrapper around the required parent run method.

  • make a copy of the config file for the new module:

cp config/template.yml config/MyNewTask.yml
  • edit the new YAML config file with the corresponding info for the task (recommended to use Sublime Text or Atom)

  • import the module inside the sns_tasks/__init__.py

  • add the new module to a task list to be run

Adding Task Reports

Analysis task modules can have associated report files. These should be R Markdown formatted documents designed to be imported as child-documents to the parent report included in snsxt/report. A module specific report can be added like this:

The new report should now be detected by the parent reporting R Markdown document and included in the final report output.

Tests

Unit tests for the various modules included in the program can be run with the test.py script. Individual modules can be tested with their corresponding test_*.py scripts.

Software

Designed and tested in Python 2.7

Designed to run on Linux systems, tested under CentOS 6

Requires pandoc version 1.13+ for reporting

Credits

sh.py is used as an included dependency.

sns pipeline output is required to run this.

snsxt uses the util and sns_classes libraries as dependecies

About

sns pipeline extension for NGS WES data analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.3%
  • R 9.7%
  • Shell 3.3%
  • CSS 0.7%