Snakemake genome assembly

This is a Snakemake pipeline that downloads reads from SRA, assembles them using Unicycler, and outputs various quality metric files and plots. The steps in the pipeline are:

Download unassembled genomes from SRA using SRA toolkit
Quality control with FASTP
phiX spikein removal with bbduk
Assemble genomes with Unicycler
Get genome assembly metrics with CheckM2 and Quast

Running the pipeline

Edit the config file in the indicated places
Install snakemake. A bare conda/mamba environment is recommended (ie., created with mamba create -c conda-forge -c bioconda -n snakemake snakemake)
Edit config/config.yml.
- sra_list should be the path to a newline-separated file of SRA accessions.
- Enter the path to the checkm2 database on your system. If you don't have it installed, you can download it directly from here (source) and put enter the path into the config file.
- By default, the pipeline will put everything into the output folder - change the path if you'd like it to be put somewhere else
Edit slurm/config.yaml.
- In particular, you'll need to edit the default-resources entry with the default partition you'd like to use to submit slurm jobs to.
Run the pipeline with snakemake --use-conda -c

Assumptions

The fastq files dumped from SRA are paired-end (ie, after dumping, they'll be named something like SRRXXXXX_pass_1.fastq.gz and SRRXXXXX_pass_2.fastq.gz)

Workflow

To do

Reports
- Spades runtime reports
- Metric plots
  - See here for parameterizing R scripts with Snakemake: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#r-and-r-markdown
Clean up names in checkm2 report
Modify running time for CheckM2, overall workflow based on number of inputs
- https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
Add a rule to download the checkm2 database if it doesn't exist
Allow user to supply paired-end reads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Snakemake genome assembly

Running the pipeline

Assumptions

Workflow

To do

Files

README.md

Latest commit

History

README.md

File metadata and controls

Snakemake genome assembly

Running the pipeline

Assumptions

Workflow

To do