Skip to content

Detection of neoantigens from WES and RNA sequencing data

License

Notifications You must be signed in to change notification settings

bioinfo-pf-curie/nf-neoant

Repository files navigation

nf-neoant

Detection of neoantigens from WES and RNA sequencing data

nf-neoAnt pipeline

Nextflow Install with Singularity Container available Docker Container available

Introduction

The pipeline is built using Nextflow, a workflow manager to run tasks across multiple compute infrastructures in a very portable manner. It supports conda package manager and singularity / Docker containers making installation easier and results highly reproducible.

Pipeline summary

The objective of the pipeline is to predict tumor-specific neoantigen based on both DNA and RNA next generation sequencing data from patients.

  • HLA typing is performed by seq2HLA (v2.2) on both MHCI and MHCII, based on the paired RNA fast files.

  • Detection of neoantigen is performed by the pVACtools suite (v4.1.1). The pipeline is divided into two parts, one focusing on DNA-based analysis (pVACseq) and the other one based on fusions events derived from RNAseq data (pVACfuse).

  • MiXCR (v4.5.0) was added to provide a fast analysis of raw T- or B- cell receptor repertoires.

pVACseq

  • Paired RNAseq reads are aligned using STAR (v2.7.6a) on the STAR index using the --quantMode TranscriptomeSAM option to obtain a transcriptome-based alignments BAM file. Per gene and per transcript TPM (transcript per million) are then estimated using Salmon (v1.10.2) with the adequate Gencode GFF3 and transcripts fasta files.

  • Small somatic variants (snvs, indels) were first called using the GATK Mutect2 (v4.1.8.0).

    • Variants were annotated using VEP (ENSEMBL v110.1).
    • Both gene (GX) and transcript (TX) expressions were then added using vatools (v5.1.0) and previously computed expression files
    • RNA depth (RDP) and RNA allelic ratio (RAF) were then added using a combination of bcftools (v1.15.1), GATK SelectVariants (v4.1.9.0) and bam-readcount (v0.8).
  • pVACseq was then run using HLA typing files (for MHCI & MHCII) on the resulting variant file.

pVACfuse

  • Arriba (v2.4.0) was run on a subset of the original STAR aligned file containing only reads of putative relevance to fusion detection, such as unmapped and clipped reads.
  • pVACfuse was then run on the list of filtered fusions of interest, using both HLA typing files.

Workflow

HLAtyping DNAseq RNAseq Fusion

Run the pipeline from a sample plan

Arguments & Parameters

  • sample_plan: csv file containing per-row samples information

  • assembly: the genome assembly for the analysis (example: hg38)

  • genomePath: path containing the different files described in "conf/genomes.config"

  • singularityImagePath: path to singularity images

  • vep_dir_cache: path to the downloaded VEP cache from those instructions (here: species="homo_sapiens" & version="110_GRCh38")

  • vep_plugin_repo: path to the VEP_plugins repository in which the Frameshift.pm was downloaded.

  • blacklist_tsv: file obtained from downloading arriba archive (in the /database folder) called "blacklist_${assembly}*.tsv.gz"

  • proteinGff: file obtained from downloading arriba archive (in the /database folder) called "protein_domain_${assembly}*.gff3"

  • mi_license: path to the "mi.license" file neeeded for mixcr, free for academic

  • tmpdir: path to temporary folder

nextflow run main.nf --samplePlan ${sample_plan} \
                     --genome ${assembly} \
                     --genomeAnnotationPath ${genomePath} \
                     --outDir ${outputDir} \
                     --singularityImagePath ${sif} \
                     --vepDirCache ${vep_dir_cache} \
                     --vepPluginRepo ${vep_plugin_repo} \
                     --miLicense ${mi_license} \
                     --tmpdir ${tmpdirp} \
                     -profile singularity,cluster \
                     -w ${tmp_dir} \
                     -resume

Sample plan

A sample plan is a csv file (comma separated) that lists all the samples with a biological IDs. The sample plan is expected to contain the following fields (with no header):

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf,  path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex

Steps

Basic steps are the following: HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr. They can be use separately (e.g.: --step HLAtyping or --step RNAquant or --step mixcr) or combined partially (e.g.: --step HLAtyping,RNAquant,pVacseq ; --step HLAtyping,pVacfuse) or all together (default mode ; --step HLAtyping, RNAquant, pVacseq, pVacfuse, mixcr) using the --step option.

HLA typing

If you only want to get HLA alleles (MHCI & MHCII), add the step "--step HLAtyping" to your command line. If you already have the two HLA allele files (MHCI & MHCII), add the full path to the sample plan as follow:

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf,  path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,path_to_HLAI_file,path_toHLAII_file

RNA expression

If you only want to get transcript/gene based expression files (tpm), add the step "--step RNAquant" to your command line. If you already have the two gene-based and transcript-based expression files, add the full path to the sample plan as follow:

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf,  path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,path_to_HLAI_file,path_toHLAII_file,path_to_gene_tpm_file,path_to_transcript_tpm_file

or, if you want to run the HLAtyping step (--step HLAtyping,RNAquant,pVacseq)

sampleID, sampleName, normalName, path_to_fastqDnaR1, path_to_fastqDnaR2, path_to_sampleDnaBam, path_to_sampleDnaBamIndex, path_to_vcf,  path_to_fastqRnaR1, path_to_fastqRnaR2, path_to_sampleRnaBam, path_to_sampleRnaBamIndex,,,path_to_gene_tpm_file,path_to_transcript_tpm_file

Test

Run the pipeline on the test dataset that will launch HLAtyping:

nextflow run main.nf -profile test,singularity --outDir ${outputDir} --singularityImagePath ${sif} -w ${work_dir}

Credits

This pipeline has been written by Institut Curie bioinformatics platform CUBIC (E.Girard, N.Servant). The project was funded by IMMUcan, the integrated European immuno-oncology profiling platform.

Contacts

For any question, bug or suggestion, please use the issue system or contact the bioinformatics core facility.

About

Detection of neoantigens from WES and RNA sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published