alignment-variants

A pipeline to align (map) reads against a reference genome, and call variants based on the alignment.

Pipeline

Reads are trimmed and filtered using fastp, then mapped against the reference genome using bwa-mem (short reads) or minimap2 (long reads). Alignment metrics are generated by samtools and qualimap.

Variants are called using freebayes.

flowchart TD
  ref[ref.fa]
  fastq_short[fastq_short]
  fastq_long[fastq_long]
  fastq_short --> fastp(fastp)
  fastq_long --> nanoq_pre_filter("nanoq (pre-filter)")
  fastq_long --> filtlong(filtlong)
  filtlong --> nanoq_post_filter("nanoq (post-filter)")
  ref --> bwa(bwa_mem)
  fastp --> bwa
  bwa --> alignments[alignments]
  ref --> minimap2(minimap2)
  filtlong --> minimap2
  minimap2 --> alignments
  alignments --> qualimap(qualimap_bamqc)
  alignments --> samtools_stats(samtools_stats)
  alignments --> freebayes(freebayes)
  alignments --> samtools_mpileup(samtools_mpileup)
  samtools_mpileup --> generate_low_coverage_bed(generate_low_coverage_bed)
  samtools_mpileup --> percent_coverage_by_depth(percent_coverage_by_depth)
  qualimap --> combined_alignment_qc(combined_alignment_qc)
  samtools_stats --> combined_alignment_qc

Usage

nextflow run BCCDC-PHL/alignment-variants \
  --ref /path/to/ref.fa \
  --fastq_input /path/to/fastqs \
  --outdir /path/to/outputs

Including the --collect_outputs flag will add the collected_fastp.csv and collected_qualimap_bamqc.csv outputs, which include results for all samples collected into a single file.

nextflow run BCCDC-PHL/alignment-variants \
  --ref /path/to/ref.fa \
  --fastq_input /path/to/fastqs \
  --collect_outputs \
  --outdir /path/to/outputs

An alternative filename prefix can be set for the collected outputs using the --collected_outputs_prefix flag:

nextflow run BCCDC-PHL/alignment-variants \
  --ref /path/to/ref.fa \
  --fastq_input /path/to/fastqs \
  --collect_outputs \
  --collected_outputs_prefix 'test' \
  --outdir /path/to/outputs

...which would produce files named: test_fastp.csv and test_qualimap_bamqc.csv.

If long reads are available, they can be included with the --fastq_input_long and --align_long_reads flags:

nextflow run BCCDC-PHL/alignment-variants \
  --ref /path/to/ref.fa \
  --fastq_input /path/to/fastqs \
  --fastq_input_long /path/to/long_fastqs \
  --align_long_reads \
  --outdir /path/to/outputs

Alternatively, a samplesheet.csv file can be provided:

nextflow run BCCDC-PHL/alignment-variants \
  --ref /path/to/ref.fa \
  --samplesheet_input /path/to/samplesheet.csv \
  --outdir /path/to/outputs

The fields should include:

ID
R1
R2

...and if long reads are included, use the field:

LONG

...if separate reference sequences are to be used for each sample, a REF field can be included, with a path to the reference genome for each sample. When including the REF field in the samplesheet, the --ref flag can be omitted.

Alignment Cleaning

By default, alignments will be cleaned as described below. Alignment cleaning is optional, and can be skipped using the --skip_alignment_cleaning flag.

Immediately after alignment, the following filters are applied using samtools view with the -F 1548 flag.

This translates to removing reads that meet these criteria:

read unmapped (0x4)
mate unmapped (0x8)*
read fails platform/vendor quality checks (0x200)
read is PCR or optical duplicate (0x400)

See the explain-sam-flags site for more details on these flags.

After that, secondary and unmapped reads are removed using the -r flag of samtools fixmate

And duplicates are removed using the -r flag of samtools markdup

Outputs

output
├── SAMPLE-1
│   ├── SAMPLE-1_20231123140142_provenance.yml
│   ├── SAMPLE-1_fastp.csv
│   ├── SAMPLE-1_fastp.json
│   ├── SAMPLE-1_long.bam
│   ├── SAMPLE-1_long.bam.bai
│   ├── SAMPLE-1_short.bam
│   ├── SAMPLE-1_short.bam.bai
│   ├── SAMPLE-1_short_depths.tsv
│   ├── SAMPLE-1_short_freebayes.vcf
│   ├── SAMPLE-1_short_low_coverage_regions.bed
│   ├── SAMPLE-1_short_percent_coverage_by_depth.csv
│   ├── SAMPLE-1_short_qualimap_alignment_qc.csv
│   ├── SAMPLE-1_short_qualimap_genome_results.txt
│   └── SAMPLE-1_short_qualimap_report.pdf
├── SAMPLE-2
│   ├── SAMPLE-2_20231123140112_provenance.yml
│   ├── SAMPLE-2_fastp.csv
│   ├── SAMPLE-2_fastp.json
│   ├── SAMPLE-2_long.bam
│   ├── SAMPLE-2_long.bam.bai
│   ├── SAMPLE-2_short.bam
│   ├── SAMPLE-2_short.bam.bai
│   ├── SAMPLE-2_short_depths.tsv
│   ├── SAMPLE-2_short_freebayes.vcf
│   ├── SAMPLE-2_short_low_coverage_regions.bed
│   ├── SAMPLE-2_short_percent_coverage_by_depth.csv
│   ├── SAMPLE-2_short_qualimap_alignment_qc.csv
│   ├── SAMPLE-2_short_qualimap_genome_results.txt
│   └── SAMPLE-2_short_qualimap_report.pdf
├── collected_fastp.csv
└── collected_qualimap_bamqc.csv

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github		.github
bin		bin
environments		environments
modules		modules
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alignment-variants

Pipeline

Usage

Alignment Cleaning

Outputs

About

Releases 12

Packages

Languages

BCCDC-PHL/alignment-variants

Folders and files

Latest commit

History

Repository files navigation

alignment-variants

Pipeline

Usage

Alignment Cleaning

Outputs

About

Topics

Resources

Stars

Watchers

Forks

Releases 12

Packages 0

Languages

Packages