Warning: This is a pre-release version, we are actively developing this repository. Issues, bugs and features will happen, rise and change.
We foster the openness, integrity, and reproducibility of scientific research.
Scripts and tools used to develop a pipeline to analyse RAD-tags and generate the Genetic Map with GWAS.
This repository host both the scripts and tools developed by this study. Feel free to adapt the scripts and tools, but remember to cite their authors!
To look at our scripts and raw results, browse through this repository. If you want to reproduce our results you will need to clone this repository, build the docker, and the run all the scripts. If you want to use our data for our own research, fork this repository and cite the authors.
All required files and tools run in a self-contained docker image.
git clone --recursive https://github.com/pseudogene/radmap.git
cd radmap
docker build --rm=true -t radmap .
To import and export the results of your analysis, you need to link a folder to the docker. In this example your data will be store in results
(current filesystem) which will be seem as been /map
from within the docker by using -v <USERFOLDER>:/map
.
mkdir results
docker run -i -t --rm -v $(pwd)/results:/map radmap /bin/bash
plink classic file PED and MAP are required as well as a pedigree file.
The pedigree file consists of on columns 1-4+. The columns are separated by tabs. The columns 1-4 are individual name, father, mother and sex; the next columns are for extra phenotypes: phenotype_1 to phenotype_n. The phenotypes are not required, but will be helpful for the GWAS analysis.
sample father mother sex phenotype_1
F1_C2_070 P0_sir P0_dam M 30
F1_C2_120 P0_sir P0_dam F 1
P0_dam 0 0 F -
P0_sir 0 0 M -
If you ran stacks, you can generate the plink files using populations command:
populations -P dir -b batch_id [-M popmap] (filters) --write_single_snp -k --plink
or
populations -V vcf -O dir [-M popmap] (filters) --write_single_snp -k --plink
e.g.:
populations -P ./stakcs/ -M map.pop -b 1 -p 2 -r 0.75 --min_maf 0.01 --write_single_snp -k --plink
This should generate the ped and map file in ./stakcs/batch_1.plink.ped
and ./stakcs/batch_1.plink.map
.
If you used dDocent, you can convert your Final.recode.vcf
with vcftools
vcftools --vcf Final.recode.vcf --plink --maf 0.01 --out plink
See the official documentation at https://sourceforge.net/p/lepmap2/wiki/Home/
plinktomap.pl --ped plink.ped --meta meta_parents.txt --lepmap >input.linkage
#Filter dataset
java -cp /usr/local/bin/lepmap2 Filtering data=input.linkage dataTolerance=0.001 MAFLimit=0.01 >input_f.linkage
#Test the best LOD limit (form 0.5 to 15)
for I in $(seq 0.5 0.5 15)
do
java -cp /usr/local/bin/lepmap2 SeparateChromosomes data=input_f.linkage sizeLimit=10 lodLimit=${I} >/dev/null 2>>lod.log
done
grep "Number of LGs" lod.log >lod.txt
cat lod.txt
#For the best LOD limit (e.g. 5.5)
java -cp /usr/local/bin/lepmap2 SeparateChromosomes data=input_f.linkage sizeLimit=10 lodLimit=5.5 >input.map
java -cp /usr/local/bin/lepmap2 JoinSingles input.map data=input_f.linkage lodLimit=0.5 >input.jsmap
#Calculate a genetic map (see LepMap2 documentation for more details)
java -cp /usr/local/bin/lepmap2 OrderMarkers useKosambi=1 maxDistance=50 map=input.jsmap data=input_f.linkage >lepmap.ordered
See the official documentation at https://sourceforge.net/p/lep-map3/wiki/Home/
plinktomap.pl --ped plink.ped --meta meta_parents.txt --lepmap3 >input.linkage
linkage2post.pl -in input.linkage | java -cp /usr/local/bin/lepmap2 Transpose > input.post
#Filter dataset
java -cp /usr/local/bin/lepmap3 Filtering2 data=input.post dataTolerance=0.001 MAFLimit=0.01 >input_f.call
#Test the best LOD limit (form 5 to 30)
for I in $(seq 5 1 30)
do
java -cp /usr/local/bin/lepmap3 SeparateChromosomes2 data=input_f.call lodLimit="${I}" sizeLimit=10 >/dev/null 2>>lod.log
done
grep "Number of LGs" lod.log >lod.txt
cat lod.txt
#For the best LOD limit (e.g. 8)
java -cp /usr/local/bin/lepmap3 SeparateChromosomes2 data=input_f.call lodLimit=8 sizeLimit=10 >input.map
java -cp /usr/local/bin/lepmap3 JoinSingles2All map=input.map data=input_f.call lodLimit=0.5 iterate=1 >input.jsmap
#Calculate a genetic map (see LepMap3 documentation for more details)
java -cp /usr/local/bin/lepmap3 OrderMarkers2 useKosambi=1 map=input.jsmap data=input_f.call >lepmap.ordered
See the official documentation at https://CRAN.R-project.org/package=SNPassoc
plinktomap.pl --plink plink.map --ped plink.ped --meta meta_parents.txt --snpassoc >input.all.snp
plinktomap.pl --plink plink.map --ped plink.ped --meta meta_parents.txt --map input.jsmap --snpassoc >input.mappable.snp
plinktomap.pl --plink plink.map --ped plink.ped --meta meta_parents.txt --map input.jsmap --gmap lepmap.ordered --female --snpassoc >input.snp 2>input.gmap
library(SNPassoc)
SNP <- read.delim("input.all.snp");
SNPAssoc<-setupSNP(data=SNP,colSNPs=4:length(SNP), sep="/");
SNPAssocSex<-WGassociation(sex~1, data=SNPAssoc, model="codominant")
BonSNPAssocSex<-Bonferroni.sig(SNPAssocSex, model = "codominant", alpha = 0.05, include.all.SNPs=FALSE);
summary(SNPAssocSex)
plot(SNPAssocSex, whole=FALSE, print.label.SNPs = FALSE)
library(SNPassoc)
SNP <- read.delim("input.snp",header=TRUE);
order <- read.delim("input.gmap");
order$Marker <- paste('X',order$Marker,sep=""); #If Stacks output
#order$Marker<-gsub(":", ".", order$Marker); #If dDocent output
SNPAssoc<-setupSNP(data=SNP,colSNPs=4:length(SNP), sort=TRUE, info=order,sep="/");
SNPAssocSex<-WGassociation(sex, data=SNPAssoc, model="codominant");
BonSNPAssocSex<-Bonferroni.sig(SNPAssocSex, model = "codominant", alpha = 0.05, include.all.SNPs=FALSE);
summary(SNPAssocSex)
png("AssocSex.png");
plot(SNPAssocSex, whole=FALSE, print.label.SNPs = FALSE, sort.chromosome=TRUE);
dev.off();
write.csv(SNPAssocSex,file="AssocSex.csv");
plinktomap.pl --genetic input.gmap --extra AssocSex.csv >genetic_map.tsv
Create the final genetic map with the marker sequences and locations (Stacks only) (recommended for publication)
plinktomap.pl --genetic input.gmap --extra AssocSex.csv --markers batch_<n>.catalog.tags.tsv --loc >genetic_map.tsv
If you have any problems with or questions about the scripts, please contact us through a GitHub issue. Any issue related to the scientific results themselves must be done directly with the authors.
You are invited to contribute new features, fixes, or updates, large or small; we are always thrilled to receive pull requests, and do our best to process them as fast as we can.
The content of this project itself including the raw data and work are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, and the source code presented is licensed under the GPLv3 license.