Skip to content

Latest commit

 

History

History
65 lines (47 loc) · 1.87 KB

File metadata and controls

65 lines (47 loc) · 1.87 KB

Annotation preprocessing pipeline

The annotation preprocessing workflow cleans contigs in an assembly (parameter: genome), calculates assembly statistics pre and post cleaning, along with Busco scores post cleaning (parameter: busco_lineage).

Quick start

Run workflow using the singularity profile:

params.yml:

subworkflow: 'annotation_preprocessing'
genome: '/path/to/genome/assembly.fasta'
busco_lineage:
  - 'eukaryota_odb10'
  - 'bacteria_odb10'
outdir: '/path/to/save/results'

Command line:

nextflow run NBISweden/pipelines-nextflow \
    -profile singularity \
    -params-file params.yml

Parameters

  • General:
    • genome: The path to the genome assembly in quotes.
    • outdir: The name of the results folder.
  • Busco:
    • busco_lineage: The busco lineages to compare against (default: '[ 'eukaryota_odb10', 'bacteria_odb10' ]').
    • busco_lineages_path: The folder where busco lineages have been downloaded for shared use (default: unset - the selected busco lineage is downloaded by each process).

Tool specific parameters

In these workflows, the Nextflow process directive ext.args is used to inject command line tool parameters directly to the shell script. These command line tool parameters can be changed by overriding the ext.args variable for the respective process in a configuration file.

nextflow.config:

process {
    withName: 'ASSEMBLY_PURIFY' {
        ext.args   = '--size 1000'
    }
}

See Annotation preprocessing modules config for the default tool configuration.

Workflow Stages

  1. Filter: Remove fasta sequences less than min_length bases.
  2. Summarise and plot assembly metrics.
  3. Run BUSCO on filtered assembly.

Known issues

  1. The Busco conda package does not resolve dependencies when channel_priority: strict is used.