Preparing Your Data

This pipeline requires certain file naming conventions and information from the primer_data.csv file.

primer_data.csv

Many of the scripts in this pipeline pull from a file named primer_data.csv. This file contains information about your primer and parameters for dada2 to infer ASV's. If you're using primers other than Dloop, Mifish, or C16, you will need to fill out this information for your primer.

Sample Primer Data:

name	locus_shorthand	seq_f	primer_length_f	seq_r	primer_length_r	F_qual	R_qual	tapestation_amplicon_length_F	tapestation_amplicon_length_R	max_amplicon_length	min_amplicon_length	max_trim	overlap	db_name	known_hashes_name
MFU	MFU	GCCGGTAAAACTCGTGCCAGC	21	CATAGTGGGGTATCTAATCCCAGTTTG	27	30	30	222	222	185	163	100	20	MURI_MFU.fasta	MFU_prev_hashes.csv
DL	DL	TCACCCAAAGCTGRARTTCTA	21	GCGGGTTGCTGGTTTCACG	19	29	20	429	429	475	200	450	12	Cetacean_Dloop_Baker_NWFSC.fasta	DL_prev_hashes.csv
C16	C16	GACGAGAAGACCCTAWTGAGCT	22	AAATTACGCTGTTATCCCT	19	NA	NA	249	249	320	NA	NA	NA	ceph_C16_sanger.fasta	C16_prev_hashes.csv

Guide to Primer Data Fields:


name	long primer name
locus_shorthand	short primer name
seq_f	forward primer sequence
primer_length_f	length of the forward primer
seq_r	reverse primer sequence
primer_length_r	length of the forward primer
F_qual	quality to trim to when determining TruncLen (dada2) for forward sequences
R_qual	quality to trim to when determining TruncLen (dada2) for reverse sequences
tapestation_amplicon_length_F	amplicon length of forward reads before sequencing (with primers, adapters, barcodes, etc.)
tapestation_amplicon_length_R	amplicon length of reverse reads before sequencing (with primers, adapters, barcodes, etc.)
max_amplicon_length	For filtering by length after merging Forward and reverse reads. Maximum amplicon length to keep.
min_amplicon_length	For filtering by length after merging Forward and reverse reads. Minimum amplicon length to keep.
max_trim	maximum quality to trim to for TruncLen (dada2)
overlap	minimum overlap when merging forward and reverse reads (minOverlap)
db_name	name of taxonomy database to use for the primer
known_hashes_name	name of database of known hashes to use for the primer

Sample Naming Conventions

The cutadapt portion of the metabarcoding pipeline (primer_trimming.sh) is dependant on the naming of each sample. Below is an example of how the fastq files must be named.

photo of sample naming scheme


Primer	Primer used in sample. Must match primer_data.csv
SampleID	SampleID of sample. Can be formatted differently but must match Hake_2019_metadata.csv to successfully create .html file.
Dilution	Amount of dilution performed on sample due to environmental contamination
Replicate	Technical Replicate number
Miseq Sample Data	Sample data that is created by illumina sequencing. This appended to the output files by the illumina sequencer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparing Your Data

primer_data.csv

Sample Primer Data:

Guide to Primer Data Fields:

Sample Naming Conventions

Clone this wiki locally