The annotation preprocessing workflow cleans contigs in an assembly (parameter: genome
),
calculates assembly statistics pre and post cleaning, along with Busco scores post cleaning
(parameter: busco_lineage
).
Run workflow using the singularity profile:
params.yml
:
subworkflow: 'annotation_preprocessing'
genome: '/path/to/genome/assembly.fasta'
busco_lineage:
- 'eukaryota_odb10'
- 'bacteria_odb10'
outdir: '/path/to/save/results'
Command line:
nextflow run NBISweden/pipelines-nextflow \
-profile singularity \
-params-file params.yml
- General:
genome
: The path to the genome assembly in quotes.outdir
: The name of the results folder.
- Busco:
busco_lineage
: The busco lineages to compare against (default: '[ 'eukaryota_odb10', 'bacteria_odb10' ]').busco_lineages_path
: The folder where busco lineages have been downloaded for shared use (default: unset - the selected busco lineage is downloaded by each process).
In these workflows, the Nextflow process directive ext.args
is used to inject command line tool parameters directly to the shell script.
These command line tool parameters can be changed by overriding the ext.args
variable for the respective process in a configuration file.
nextflow.config
:
process {
withName: 'ASSEMBLY_PURIFY' {
ext.args = '--size 1000'
}
}
See Annotation preprocessing modules config for the default tool configuration.
- Filter: Remove fasta sequences less than
min_length
bases. - Summarise and plot assembly metrics.
- Run BUSCO on filtered assembly.
- The Busco conda package does not resolve dependencies when
channel_priority: strict
is used.