shart

Scripts for High-throughput Analysis of Replication Timing

A dockerfile and scripts for executing a semi-automatated repli-seq analysis pipeline

what

This repository contains a dockerfile and scripts in order to generate replication timing profiles from a set of raw reads from sequencing of either early- and late-replicating DNA, or from DNA extracted from cells sorted for S or G1 DNA content.

The scripts for executing the pipeline are under the scripts which are added from this repository to the docker image during build time.

A docker image for executing these scripts can be built yourself or pulled from docker hub (vera/docker-4dn-repliseq).

how

example usage

# execute a step on data in the current directory
docker run -u $UID -w $PWD -v $PWD:$PWD:rw vera/shart <name_of_script> <args>

step-by-step workflow

setup

# pull the pre-built image, create and enter a container inside the directory with your data
docker run --rm -it -h d4r -u $UID:$(id -g) -w $PWD -v $PWD:$PWD:rw vera/shart

# define number of CPU threads to use for the pipeline
export NUMTHREADS=8

define your input files

# download example data
wget -cbre robots=off -np -nH --cut-dirs=3 -A 'g*' http://www.bio.fsu.edu/~dvera/share/repliseq/

# define early and late fastq files, here using sample data
E=$(ls *early*.fq.gz)
L=$(ls *late*.fq.gz)

index=bwaIndex_hg38/genome

execute workflow step by step

# clip adapters from reads with cutadapt
cfq=$(clip $E $L)

# align reads to genome with bwa
bam=$(align -i $index $cfq)
bstat=$(samstats $bam)

# filter bams by alignment quality and sort by position
sbam=$(filtersort $bam)
fbstat=$(samstats $sbam)

# remove duplicate reads
rbam=$(dedup $sbam)

# calculate RPKM bedGraphs for each set of alignments
bg=$(count $rbam)

# filter windows with a low average RPKM among libraries
fbg=$(filter $bg)

# calculate log2 ratios between early and late
l2r=$(log2ratio $fbg)

# quantile-normalize RT profiles to the average distribution
l2rn=$(normalize $l2r)

# loess-smooth profiles using a 300kb span size
l2rs=$(smooth -l 300000 -t $NTHREADS $l2rn)

organize
multiqc -f .

or use pipes

clip $E $L | align -i $index | filtersort | dedup | count | filter | log2ratio | normalize | smooth
organize
multiqc -f .

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
scripts		scripts
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shart

what

how

example usage

step-by-step workflow

setup

define your input files

execute workflow step by step

or use pipes

About

Releases

Packages

Languages

dvera/shart

Folders and files

Latest commit

History

Repository files navigation

shart

what

how

example usage

step-by-step workflow

setup

define your input files

execute workflow step by step

or use pipes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages