Add FastQ-Screen database multiplexing #53

edmundmiller · 2024-10-29T13:42:58Z

PR checklist

github-actions · 2024-10-29T13:44:26Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit d021ed0

+| ✅ 192 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  21 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in README.md: TODO nf-core:
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-seqinspector_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-seqinspector_logo_light.png
files_exist - File found: docs/images/nf-core-seqinspector_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-seqinspector_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowSeqinspector.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.sample_size= 0
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.fastqscreen_database_sheet= ./assets/databasesheet.csv
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: nf-test.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - SEQTK_SAMPLE found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - SEQFU_STATS found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_GLOBAL found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_PER_TAG found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2024-11-14 19:26:24

FranBonath · 2024-10-30T10:46:35Z

I am currently working on this to update the docs and get the missing param in.

Fastqscreen

…lt by the process on the fly

…o try to limit the scope of nf-test to the tests/ dir

kedhammar · 2024-11-07T13:04:27Z

843e2c6 closes #61

…update snapshots accordingly

kedhammar · 2024-11-07T14:02:38Z

Need to resolve issue with nf-test snapshots of multiqc_fastq_screen.txt being inconsistent between runs.

kedhammar · 2024-11-07T16:31:08Z

Rough summary of current status after some investigating by me and @FranBonath

Running the pipeline with FastQ Screen for multiple samples and references in the test profile causes all sample-reference combinations to be run in the work directory, but only a one-per-sample subset of sample-reference combinations to be added to the publishdir (because the processes send their identically named output files to the same publishdir).
Subsequently, the MultiQC will not pull all relevant information.

We think we want the output files of the process to contain the names of both the sample and reference used to generate them, and make sure they all end up in the publishdir.

NOTE
Personally, I think having one job for each sample-reference combination for hundreds of samples and dozens of references is gonna make us end up with thousands of super tiny SLURM jobs, work dirs, outdirs, etc. which might be excessive? My 2 cents is to consider decreasing the parallellization, maybe parallellize by sample or reference but not both.

edmundmiller · 2024-11-14T17:27:31Z

Running the pipeline with FastQ Screen for multiple samples and references in the test profile causes all sample-reference combinations to be run in the work directory, but only a one-per-sample subset of sample-reference combinations to be added to the publishdir (because the processes send their identically named output files to the same publishdir).

Okay I figured out a way around this. Works pretty well with MultiQC. Probably going to want to use https://seqera.io/blog/multiqc-grouped-samples/

We think we want the output files of the process to contain the names of both the sample and reference used to generate them, and make sure they all end up in the publishdir.

Also got this for free, but IMO I think publishing them should just be skipped, if you're just going to use the results inside of MultiQC.

NOTE Personally, I think having one job for each sample-reference combination for hundreds of samples and dozens of references is gonna make us end up with thousands of super tiny SLURM jobs, work dirs, outdirs, etc. which might be excessive? My 2 cents is to consider decreasing the parallellization, maybe parallellize by sample or reference but not both.

May I suggest, an array job? I think that would make your HPC admins even happier. https://www.nextflow.io/docs/latest/reference/process.html

edmundmiller · 2024-11-14T17:45:20Z

Ah okay looking at the expected fastqscreen data now https://github.com/MultiQC/test-data/blob/main/data/modules/fastq_screen/v0.14.0/scRNAseq_HISAT_example1_screen.txt

It's probably easier to handle all of the databases in one run per sample.

So two options:

Combine the TSVs after seqinspector runs
Have a seperate "create seqinspector config" process or some Nextflow and magically pull in all the databases(sounding more complicated actually)

edmundmiller · 2024-11-14T19:26:02Z

Okay I'm stumbed on both, see the commits for my attempts if anyone has time for this 46d1bfd
d021ed0

kedhammar and others added 12 commits October 28, 2024 14:56

Fran's draft

b2e7be5

nf-core modules update --> fastqscreen

d833d91

add fastq_screen.conf

285a4b3

hack out some ideas

a081563

test: Update path for database

7a4dccd

Hack ideas

f0b23a7

Initial datasheet example

3ea8d8b

refactor: Rework fastqscreen with new ideas

fea850f

chore: Bump fastq-screen

600c2fb

fix: Give up on fastq_screen writing it's own config

89e67fa

test: Update databases

a6565f7

fix: Run every sample and every DB

4af6dd4

edmundmiller assigned FranBonath and kedhammar Oct 29, 2024

edmundmiller added this to the Essential functionality milestone Oct 29, 2024

edmundmiller changed the base branch from master to dev October 29, 2024 13:43

edmundmiller force-pushed the fastqscreen branch from b44fc07 to 4af6dd4 Compare October 29, 2024 13:47

edmundmiller added 3 commits October 29, 2024 16:38

fix: Give up on fastq_screen writing it's own config

46805e8

test: Update databases

4001a50

fix: Run every sample and every DB

7ebb6c1

edmundmiller force-pushed the fastqscreen branch from 4af6dd4 to 7ebb6c1 Compare October 29, 2024 15:38

FranBonath and others added 2 commits October 30, 2024 12:02

adding fastqscreen database parameter, updating docs

e560181

Merge branch 'fastqscreen' into fastqscreen

ac7bb3f

nf-core deleted a comment from github-actions bot Oct 30, 2024

FranBonath and others added 4 commits November 7, 2024 10:43

remove absolute path of fastqscreen config

effa9c9

remove "projectDir" from path for fastqscreen config in nextflow.config

a2f2dfc

Merge pull request #64 from FranBonath/fastqscreen

c213e90

Fastqscreen

Merge branch 'dev' into fastqscreen

b1ab75b

kedhammar added 7 commits November 7, 2024 12:03

remove conf/fastq_screen.conf and it's references, as the file is bui…

2c44320

…lt by the process on the fly

remove diff dump, I don't see why it should be here

fedeab2

spacing

588ad1d

Revert remove diff dump, now I see why it should be here :)

2001ccd

fix faulty params var name in test config

5bb8dfd

update snapshots for new multiqc citations

3baa7fc

module tests of fastqscreen are dependent on non-installed modules, s…

843e2c6

…o try to limit the scope of nf-test to the tests/ dir

kedhammar added 2 commits November 7, 2024 13:34

Append changelog entry

66d08ad

Check for fastQ Screen files in multiqc report in pipeline tests and …

9567f47

…update snapshots accordingly

kedhammar requested review from Aratz and FranBonath November 7, 2024 13:43

fix indent

5248fad

kedhammar added enhancement New feature or request help wanted Extra attention is needed labels Nov 7, 2024

edmundmiller mentioned this pull request Nov 7, 2024

[FEATURE] FASTQSCREEN_FASTQSCREEN should accept multiple inputs with a single database nf-core/modules#6890

Open

edmundmiller and others added 2 commits November 14, 2024 11:28

fix: Rename fastqscreen files so they're unique

e2a290c

Merge branch 'dev' into fastqscreen

a0ef751

edmundmiller added 3 commits November 14, 2024 13:25

fix: collectFile fastqscreen rough draft

46d1bfd

chore: Lock down arity in fastqscreen

b8c363c

feat: Getting close to making the config on the head node

d021ed0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FastQ-Screen database multiplexing #53

Add FastQ-Screen database multiplexing #53

edmundmiller commented Oct 29, 2024 •

edited by kedhammar

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

FranBonath commented Oct 30, 2024

kedhammar commented Nov 7, 2024

kedhammar commented Nov 7, 2024

kedhammar commented Nov 7, 2024

edmundmiller commented Nov 14, 2024

edmundmiller commented Nov 14, 2024

edmundmiller commented Nov 14, 2024

Add FastQ-Screen database multiplexing #53

Are you sure you want to change the base?

Add FastQ-Screen database multiplexing #53

Conversation

edmundmiller commented Oct 29, 2024 • edited by kedhammar Loading

PR checklist

github-actions bot commented Oct 29, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

FranBonath commented Oct 30, 2024

kedhammar commented Nov 7, 2024

kedhammar commented Nov 7, 2024

kedhammar commented Nov 7, 2024

Rough summary of current status after some investigating by me and @FranBonath

edmundmiller commented Nov 14, 2024

edmundmiller commented Nov 14, 2024

edmundmiller commented Nov 14, 2024

edmundmiller commented Oct 29, 2024 •

edited by kedhammar

Loading

github-actions bot commented Oct 29, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️