Skip to content

This pipeline was set up to process CLIP-seq data from aligned sequencing reads. It includes two parts: i/the reads processing (cleaning, mapping, peak calling) and the differential analysis.

Notifications You must be signed in to change notification settings

bioinfo-pf-curie/clip-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLIP-seq Quick Start Guide

See NEWS for information about changes in this and previous versions

What is the CLIP-seq pipeline ?

CLIP-seq was set up to process CLIP sequencing data from sequencing reads. This pipeline is developed for single-end data. It includes the trimming, mapping, removing duplicated reads, peak identification, annotation dans differential analysis.

Contact

For any questions about the pipeline, please contact mandy.cadix@curie.fr

How to install it?

The following dependancies are required :

Annotation Files

The pipeline is using a annotation files with gene annotation information. These file is based on UCSC Refseq gene. WINDOW_SIZE is a parameter to select a part of around the end of gene (upstream or downstream at the end of gene), this size is bp unit. In order to generate all required annotation files, please set the ANNOTATION_DIR, ORG, UCSC_EXPORT and WINDOW_SIZE in the configuration file.

BUILD_ANNOT=1
ORG=hg19
UCSC_EXPORT=refseq_export_hg19.csv
WINDOW_SIZE=100

Trimming

To treat reads with adapter, it is necessary to know side adapters 5' and 3':

ADAPTER5="GTTCAGAGTTCTACAGTCCGACGATC"
ADAPTER3="TGGAATTCTCGGGTGCCAAGG"

How to use it ?

PART 1 : reads processing CLIP-seq can be used for a single sample. In order to use the pipeline, please set up the configuration according to your analysis, and run the following command to do the reads processing:

./script/pipeline_clip-seq.bash -c CONFIG -i INPUT_FILE -s STEP -n SAMPLE -o OUTPUT_DIR

-c: The configuration file

-i: The fastq format file of the CLIP-seq sample. CAUTION: only the files in fastq.gz format are accepted

-s: Steps of this pipeline:
    trimming: Remove reads which have had problems with the sequencer and remove adapters
    fastqc: Quality control for fastq file
    mapping: Map and select reads with tophat2 and the mapping quality.
    rm_dup: Remove or only mark the duplicated reads. On this step we create a sort and index file
    peak_calling: Detect peaks
    annotation: Annotate genes and peaks
    annot_spe: Annotate peaks with a specific window (last exon, downstream & upstream region of peaks, peaks with downstream region)
    all: Launch all previous steps
    
    -n: The sample identifiers (only letter or number; NO symbols)
    
    -o: The output directory
    
    -h: Help
    
    -v: Version

PART 2 : differential analysis The normalization step in the differential analysis can be defined as the normalization of the number of reads in a peak (in CLIP-seq samples) by the number of reads in the gene (in RNA-seq sample) where the peak is located on the gene.

If you want to compare treated VS untreated please enter treated samples before untreated samples (for RNA-seq, CLIP-seq and group) and run the following command to do the differential analysis:

./script/pipeline_differential_analysis.bash -c CONFIG -l INPUT_LIST -b BAM_RNA_FILES -s STEP -o OUTPUT_DIR

-c: The configuration file

-l: The list of BED files obtained from the PART 1 (See input_bed_list.txt)

-b: The list of BAM sorted of RNA-seq samples (See input_bam_list.txt)

-s: Steps of this pipeline:
    merge_peaks: Concatenate, sort and merge all peaks present in different samples
    toc_peaks: Create a table of counts for CLIP experiment
    toc_genes: Create a table of counts for RNA experiment
    cat_peaks_genes: Concatenation of tables of counts (CLIP and RNA)
    differential_analysis: Make a differential analysis
    color_peaks: Create color information file. Up and Down regulated peaks (padj <0.05) are colored on red and blue respectively
    all: Launch all previous steps

-o: The output directory

-h: Help

-v: Version

This is a input list of BED files obtained thanks to the PART 1, for the PART 2, input_bed_list.txt

Sample2  /PATH/SAMPLE2/SAMPLE2_intersectBED_500_bp_DownstreamEnd.bed  treated 1
Sample4  /PATH/SAMPLE4/SAMPLE4_intersectBED_500_bp_DownstreamEnd.bed  treated 1
Sample1  /PATH/SAMPLE1/SAMPLE1_intersectBED_500_bp_DownstreamEnd.bed  untreated 0
Sample3  /PATH/SAMPLE3/SAMPLE3_intersectBED_500_bp_DownstreamEnd.bed  untreated 0

This is a input list of BAM RNA-seq files, for the PART 2, input_bam_list.txt

Sample2  /PATH/BAM_RNA-seq_FILE/SAMPLE2_MAPQ_sort.bam treated
Sample4  /PATH/BAM_RNA-seq_FILE/SAMPLE4_MAPQ_sort.bam treated
Sample1  /PATH/BAM_RNA-seq_FILE/SAMPLE1_MAPQ_sort.bam untreated
Sample3  /PATH/BAM_RNA-seq_FILE/SAMPLE3_MAPQ_sort.bam untreated

About

This pipeline was set up to process CLIP-seq data from aligned sequencing reads. It includes two parts: i/the reads processing (cleaning, mapping, peak calling) and the differential analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published