Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FastQ-Screen database multiplexing #53

Open
wants to merge 36 commits into
base: dev
Choose a base branch
from
Open

Conversation

edmundmiller
Copy link

@edmundmiller edmundmiller commented Oct 29, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • Add fastqscreen module
  • Limit scope of nf-test CI
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • [ ] If necessary, also make a PR on the nf-core/seqinspector branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@edmundmiller edmundmiller added this to the Essential functionality milestone Oct 29, 2024
@edmundmiller edmundmiller changed the base branch from master to dev October 29, 2024 13:43
Copy link

github-actions bot commented Oct 29, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit d021ed0

+| ✅ 192 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  21 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
  • pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in README.md: TODO nf-core:
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-11-14 19:26:24

@FranBonath
Copy link
Member

I am currently working on this to update the docs and get the missing param in.

@nf-core nf-core deleted a comment from github-actions bot Oct 30, 2024
@kedhammar
Copy link

843e2c6 closes #61

@kedhammar
Copy link

Need to resolve issue with nf-test snapshots of multiqc_fastq_screen.txt being inconsistent between runs.

@kedhammar kedhammar added enhancement New feature or request help wanted Extra attention is needed labels Nov 7, 2024
@kedhammar
Copy link

Rough summary of current status after some investigating by me and @FranBonath

  • Running the pipeline with FastQ Screen for multiple samples and references in the test profile causes all sample-reference combinations to be run in the work directory, but only a one-per-sample subset of sample-reference combinations to be added to the publishdir (because the processes send their identically named output files to the same publishdir).
  • Subsequently, the MultiQC will not pull all relevant information.

We think we want the output files of the process to contain the names of both the sample and reference used to generate them, and make sure they all end up in the publishdir.

NOTE
Personally, I think having one job for each sample-reference combination for hundreds of samples and dozens of references is gonna make us end up with thousands of super tiny SLURM jobs, work dirs, outdirs, etc. which might be excessive? My 2 cents is to consider decreasing the parallellization, maybe parallellize by sample or reference but not both.

@edmundmiller
Copy link
Author

  • Running the pipeline with FastQ Screen for multiple samples and references in the test profile causes all sample-reference combinations to be run in the work directory, but only a one-per-sample subset of sample-reference combinations to be added to the publishdir (because the processes send their identically named output files to the same publishdir).

Okay I figured out a way around this. Works pretty well with MultiQC. Probably going to want to use https://seqera.io/blog/multiqc-grouped-samples/

We think we want the output files of the process to contain the names of both the sample and reference used to generate them, and make sure they all end up in the publishdir.

Also got this for free, but IMO I think publishing them should just be skipped, if you're just going to use the results inside of MultiQC.

NOTE Personally, I think having one job for each sample-reference combination for hundreds of samples and dozens of references is gonna make us end up with thousands of super tiny SLURM jobs, work dirs, outdirs, etc. which might be excessive? My 2 cents is to consider decreasing the parallellization, maybe parallellize by sample or reference but not both.

May I suggest, an array job? I think that would make your HPC admins even happier. https://www.nextflow.io/docs/latest/reference/process.html

@edmundmiller
Copy link
Author

Ah okay looking at the expected fastqscreen data now https://github.com/MultiQC/test-data/blob/main/data/modules/fastq_screen/v0.14.0/scRNAseq_HISAT_example1_screen.txt

It's probably easier to handle all of the databases in one run per sample.

So two options:

  1. Combine the TSVs after seqinspector runs
  2. Have a seperate "create seqinspector config" process or some Nextflow and magically pull in all the databases(sounding more complicated actually)

@edmundmiller
Copy link
Author

Okay I'm stumbed on both, see the commits for my attempts if anyone has time for this 46d1bfd
d021ed0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants