SeleDiff

NOTE: This project is no longer actively maintained.

Introduction

SeleDiff implements a probabilistic method for estimating and testing selection (coefficient) differences between populations¹.
If you have any problem, please feel free to contact xinhuang.res@gmail.com, or open an issue in this repository.
If you would like to reproduce our simulation, please check the codes in ./appendix.
If you are interested in contributing to SeleDiff, please feel free to clone and modify it. You should include unit tests for your modified codes. Besides, you can edit build.gradle to include new dependencies. After your modification, please send a GitHub Pull Request with a clear list of what you've done.
For more details, please see the manual in ./docs.

Installation

To install SeleDiff, you should first install Java SE Development Kit 8 or OpenJDK8.

Linux/Mac

In Linux/Mac, you can open the terminal and clone SeleDiff using git:

> git clone https://github.com/xin-huang/SeleDiff

Then you can enter the SeleDiff directory and use gradlew to install SeleDiff:

> cd ./SeleDiff
> ./gradlew build
> ./gradlew install

The runnable SeleDiff is in ./build/install/SeleDiff/bin/. You can add this directory into your PATH environment variable by:

> export PATH="/path/to/SeleDiff/build/install/SeleDiff/bin/":$PATH

You can get help information by typing:

> SeleDiff

You can use gradlew to remove SeleDiff:

> ./gradlew clean

Windows

In Windows, you can download the latest release. Please make sure your environment variable JAVA_HOME correctly point to your JDK directory. After download and uncompression, you can open cmd and enter the directory of SeleDiff in cmd. Please use gradlew.bat to build and install SeleDiff.

> cd /path/to/SeleDiff
> gradlew.bat build
> gradlew.bat install

And run SeleDiff.bat in ./build/install/SeleDiff/bin/:

> cd /build/install/SeleDiff/bin/
> SeleDiff.bat

You can use gradlew.bat to remove SeleDiff:

> cd /path/to/SeleDiff
> gradlew.bat clean

Commands

SeleDiff contains two sub-commands:

compute-var for estimating variances of Ω¹, which is required for the compute-diff command;
compute-diff for estimating selection differences among loci.

Input Files

SeleDiff assumes bi-allelic genetic data and will not perform any checks on this assumption. All input files can be compressed by gzip.

EIGENSTRAT

SeleDiff accepts EIGENSTRAT format of genetic data as inputs. EIGENSOFT provides several functions to convert other formats to EIGENSTRAT format.

VCF

SeleDiff also accepts VCF format of genetic data as inputs, and assumes genotypes of each individual are encoded with 0 and 1. Because VCF format contains no population information of each individual, users should provide an additional file following EIGENSTRAT IND format.

Var File

The Var file is the output file from the first sub-command compute-var, which stores variances of pairwise Ω. SeleDiff does not divide Ω with generation times as He et al. (2015) in order to reduce floating-point rounding errors. When estimating Ω, SeleDiff uses SNPs are not fixed in any population. When using sub-command compute-diff to estimate selection differences, SeleDiff uses --var option to accept a a SPACE delimited file without header that specifies variances of Ω between populations.

    YRI CEU 1.547660
    YRI CHS 1.639591
    CEU CHS 0.989241

The first two columns are the population IDs, and the third column is the variances of Ω between populations.

Divergence Time File

When using sub-command compute-diff to estimate selection differences, SeleDiff uses --time option to accept a SPACE delimited file without header that specifies divergence times between two populations.

    YRI CEU 5000
    YRI CHS 5000
    CEU CHS 3000

The first two columns are the population IDs, and the third column is the divergence times of the two populations.

Output File

The output file from SeleDiff is TAB delimited. The first row is a header that describes the meaning of each column.

Column	Column Name	Description
1	SNP ID	The name of a SNP
2	Ref	The reference allele
3	Alt	The alternative allele
4	Population1	The first population ID
5	Population2	The second population ID
6	Selection difference	The selection difference between the first and second populations
7	Std	The standard deviation of the selection difference
8	Lower bound of 95% CI	Lower bound of 95% confidence interval of the selection difference
9	Upper bound of 95% CI	Upper bound of 95% confidence interval of the selection difference
10	Delta	The delta statistic for selection difference
11	p-value	The p-value of the delta statistic

An Example

Here is an example to show how SeleDiff estimates and tests selection differences between populations. Four populations (YRI, CEU, CHB, CHD) from HapMap3 (release3) were extracted. CHB and CHD were merged into one population called CHS. PLINK 1.7 were used to remove correlated individuals and SNPs with minor allele frequences less than 0.05 and strong linkage disequilibrium. These genome-wide data are stored in ./examples/data/example.geno and used for estimating variances of Ω.

Two alternative alleles (rs1800407 and rs12913832) associated with blue eyes were identified in genes HERC2 and OCA2². These candidate data are stored in ./examples/data/example.candidates.geno and used for estimating selection differences of these SNPs between populations.

The counts of alleles in our example data were summarized in below.

SNP ID	Population	Reference Allele Count	Alternative Allele Count
rs1800407	YRI	290	0
rs1800407	CEU	207	17
rs1800407	CHS	486	4
rs12913832	YRI	294	0
rs12913832	CEU	47	177
rs12913832	CHS	491	1

We assume the divergence time of YRI-CEU and YRI-CHS are both 5000 generations, while the divergence time of CEU-CHS is 3000 generations. This information is stored in ./examples/data/example.time.

First, we estimate variances of Ω using sub-command compute-var:

> SeleDiff compute-var --geno ./examples/data/example.geno \
                       --ind ./examples/data/example.ind \
                       --snp ./examples/data/example.snp \
                       --output ./examples/results/example.geno.var

To estimate selection differences of candidates, we use the sub-command compute-diff:

> SeleDiff compute-diff --geno ./examples/data/example.candidates.geno \
                        --ind ./examples/data/example.candidates.ind \
                        --snp ./examples/data/example.candidates.snp \
                        --var ./examples/results/example.geno.var \
                        --time ./examples/data/example.time \
                        --output ./examples/results/example.candidates.geno.results

The result is stored in ./examples/results/example.candidates.geno.results. The main result is in below.

SNP ID	Population1	Population2	Selection difference	Std	delta	p-value
rs1800407	YRI	CEU	-0.000773	0.000380	4.129	0.042154
rs1800407	YRI	CHS	-0.000336	0.000393	0.731	0.392559
rs1800407	CEU	CHS	0.000728	0.000377	3.730	0.053443
rs12913832	YRI	CEU	-0.001541	0.000378	16.583	0.000047
rs12913832	YRI	CHS	-0.000117	0.000415	0.080	0.777297
rs12913832	CEU	CHS	0.002372	0.000433	30.062	0.000000

From the result, we can see the selection coefficient of rs12913832 in CEU is significantly larger than that in YRI or CHS, which indicates rs12913832 is under directional selection in CEU. While the selection coefficient of rs1800407 in CEU is marginal significantly larger than that in YRI or CHS.

Please refer to our previous study¹ for a more comprehensive working example using the HapMap3 dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.gradle		.gradle
.idea		.idea
.settings		.settings
appendix		appendix
docs		docs
examples		examples
figures		figures
gradle		gradle
paper		paper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
circle.yml		circle.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeleDiff

Introduction

Installation

Linux/Mac

Windows

Commands

Input Files

EIGENSTRAT

VCF

Var File

Divergence Time File

Output File

An Example

Dependencies

References

About

Releases 2

Packages

Contributors 2

Languages

License

xin-huang/SeleDiff

Folders and files

Latest commit

History

Repository files navigation

SeleDiff

Introduction

Installation

Linux/Mac

Windows

Commands

Input Files

EIGENSTRAT

VCF

Var File

Divergence Time File

Output File

An Example

Dependencies

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages