-
Notifications
You must be signed in to change notification settings - Fork 3
Preparing Your Data
alexandriai168 edited this page Aug 14, 2023
·
11 revisions
This pipeline requires certain file naming conventions and information from the primer_data.csv file.
Many of the scripts in this pipeline pull from a file named primer_data.csv. This file contains information about your primer and parameters for dada2 to infer ASV's. If you're using primers other than Dloop, Mifish, or C16, you will need to fill out this information for your primer.
name | locus_shorthand | seq_f | primer_length_f | seq_r | primer_length_r | F_qual | R_qual | tapestation_amplicon_length_F | tapestation_amplicon_length_R | max_amplicon_length | min_amplicon_length | max_trim | overlap | db_name | known_hashes_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MFU | MFU | GCCGGTAAAACTCGTGCCAGC | 21 | CATAGTGGGGTATCTAATCCCAGTTTG | 27 | 30 | 30 | 222 | 222 | 185 | 163 | 100 | 20 | MURI_MFU.fasta | MFU_prev_hashes.csv |
DL | DL | TCACCCAAAGCTGRARTTCTA | 21 | GCGGGTTGCTGGTTTCACG | 19 | 29 | 20 | 429 | 429 | 475 | 200 | 450 | 12 | Cetacean_Dloop_Baker_NWFSC.fasta | DL_prev_hashes.csv |
C16 | C16 | GACGAGAAGACCCTAWTGAGCT | 22 | AAATTACGCTGTTATCCCT | 19 | NA | NA | 249 | 249 | 320 | NA | NA | NA | ceph_C16_sanger.fasta | C16_prev_hashes.csv |
name | long primer name |
locus_shorthand | short primer name |
seq_f | forward primer sequence |
primer_length_f | length of the forward primer |
seq_r | reverse primer sequence |
primer_length_r | length of the forward primer |
F_qual | quality to trim to when determining TruncLen (dada2) for forward sequences |
R_qual | quality to trim to when determining TruncLen (dada2) for reverse sequences |
tapestation_amplicon_length_F | amplicon length of forward reads before sequencing (with primers, adapters, barcodes, etc.) |
tapestation_amplicon_length_R | amplicon length of reverse reads before sequencing (with primers, adapters, barcodes, etc.) |
max_amplicon_length | For filtering by length after merging Forward and reverse reads. Maximum amplicon length to keep. |
min_amplicon_length | For filtering by length after merging Forward and reverse reads. Minimum amplicon length to keep. |
max_trim | maximum quality to trim to for TruncLen (dada2) |
overlap | minimum overlap when merging forward and reverse reads (minOverlap) |
db_name | name of taxonomy database to use for the primer |
known_hashes_name | name of database of known hashes to use for the primer |
The cutadapt portion of the metabarcoding pipeline (primer_trimming.sh) is dependant on the naming of each sample. Below is an example of how the fastq files must be named.
Primer | Primer used in sample. Must match primer_data.csv |
SampleID | SampleID of sample. Can be formatted differently but must match Hake_2019_metadata.csv to successfully create .html file. |
Dilution | Amount of dilution performed on sample due to environmental contamination |
Replicate | Technical Replicate number |
Miseq Sample Data | Sample data that is created by illumina sequencing. This appended to the output files by the illumina sequencer. |