Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds fq/lint for early validation of FASTQs #67

Open
wants to merge 9 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
- [#67](https://github.com/nf-core/seqinspector/pull/67) Add FASTQ linting for early validation

### `Fixed`

Expand Down
2 changes: 2 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

## Pipeline tools

- [FQ](https://github.com/stjude-rust-labs/fq)

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
Expand Down
5 changes: 3 additions & 2 deletions README.md
adamrtalbot marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. Lint FASTQs with ([`fq`](https://github.com/stjude-rust-labs/fq))
1. Subsample reads ([`Seqtk`](https://github.com/lh3/seqtk))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
1. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

## Usage

Expand Down
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: 'FQ_LINT' {
ext.args = { params.fq_lint_args }
errorStrategy = {
task.exitStatus in ((130..145) + 104) ? 'retry' :
params.continue_with_lint_fail ? 'ignore' :
'finish'
}
}

withName: SEQTK_SAMPLE {
ext.args = '-s100'
}
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"fq/lint": {
"branch": "master",
"git_sha": "a1abf90966a2a4016d3c3e41e228bfcbd4811ccc",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/fq/lint/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 33 additions & 0 deletions modules/nf-core/fq/lint/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions modules/nf-core/fq/lint/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

63 changes: 63 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions modules/nf-core/fq/lint/tests/tags.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@ params {
// Input options
input = null
sample_size = 0

// Options
skip_linting = false
fq_lint_args = ""
continue_with_lint_fail = false


// References
genome = null
fasta = null
Expand Down
28 changes: 27 additions & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
},
"outdir": {
"type": "string",
"default": null,
"format": "directory-path",
"description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
"fa_icon": "fas fa-folder-open"
Expand All @@ -50,6 +49,30 @@
}
}
},
"validation_options": {
"title": "Validation options",
"type": "object",
"description": "Options for validating and screening FASTQ files.",
"default": "",
"properties": {
"skip_linting": {
"type": "boolean",
"default": false,
"description": "Whether to lint the FASTQs before performing QC on the sequences",
"help_text": "FASTQ files will be linted with FQ early in the pipeline. If they fail validation, the pipeline will terminate preventing expensive quality control steps being performed on the other samples. If ignoring FQ is enabled, quality control will be performed on the remaining samples."
},
"fq_lint_args": {
"type": "string",
"description": "Arguments to pass to FQ lint",
"help_text": "Arguments to pass to FQ lint. This can be used to disable overly strict linting. See https://github.com/stjude-rust-labs/fq?tab=readme-ov-file#lint for more information."
},
"continue_with_lint_fail": {
"type": "boolean",
"description": "Whether to continue with the pipeline if linting fails for a single sample.",
"help_text": "If set to true, the pipeline will continue with the remaining samples if linting fails for a single sample. If set to false, the pipeline will terminate if linting fails for a single sample."
}
}
},
"reference_genome_options": {
"title": "Reference genome options",
"type": "object",
Expand Down Expand Up @@ -233,6 +256,9 @@
{
"$ref": "#/$defs/input_output_options"
},
{
"$ref": "#/$defs/validation_options"
},
{
"$ref": "#/$defs/reference_genome_options"
},
Expand Down
87 changes: 87 additions & 0 deletions tests/rnaseq.main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
nextflow_pipeline {

name "Test Workflow main.nf on NovaSeq6000 data"
script "../main.nf"
tag "seqinspector"
tag "PIPELINE"

test("rnaseq data test fail linting") {

when {
config "./rnaseq.main.nf.test.config"
params {
outdir = "$outputDir"
}
}

then {
assertAll(
// Linting should fail!
{ assert workflow.failed }
)
}
}

test("rnaseq data test skip linting") {

when {
config "./rnaseq.main.nf.test.config"
params {
outdir = "$outputDir"
skip_linting = true
}
}

then {
assertAll(
{ assert workflow.success }
)
}
}

test("rnaseq data test ignore linting") {

when {
config "./rnaseq.main.nf.test.config"
params {
outdir = "$outputDir"
continue_with_lint_fail = true
}
}

then {
assertAll(
{ assert workflow.success },
{ assert snapshot(
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_citations.txt"),
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_fastqc.txt"),
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_general_stats.txt")
)
},
)
}
}

test("rnaseq data test add args to fq/lint") {

when {
config "./rnaseq.main.nf.test.config"
params {
outdir = "$outputDir"
fq_lint_args = "--disable-validator P001"
}
}

then {
assertAll(
{ assert workflow.success },
{ assert snapshot(
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_citations.txt"),
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_fastqc.txt"),
path("$outputDir/multiqc/global_report/multiqc_data/multiqc_general_stats.txt")
)
},
)
}
}
}
8 changes: 8 additions & 0 deletions tests/rnaseq.main.nf.test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Load the basic test config
includeConfig 'nextflow.config'

// Load the correct samplesheet for that test
params {
input = params.pipelines_testdata_base_path + '626c8fab639062eade4b10747e919341cbf9b41a/samplesheet/v3.10/samplesheet_test.csv'

}
Loading
Loading