protTrace - A simulation based framework to estimate the evolutionary traceability of protein.

Scientific context

ProtTrace is a simulation based approach to assess for a protein, the seed, over what evolutionary distances its orthologs can be found by means of sharing a significant sequence similarity. By doing so, it helps to differentiate between the true absence of an ortholog in a given species, and its non-detection due to a limited search sensitivity. ProtTrace was presented 2018 at the German Conference on Bioinformatics (GCB). The high resolution PDF of the corresponding poster is available from HERE.

Workflow

The workflow of protTrace to infer the evolutionary traceability of a seed protein is shown in the figure below (mouse over to see details). It consists of three main steps

Parameterization: The compilation of an orthologous group for this protein. In the standard setting, OMA orthologous groups are used. The sequences in the ortholog group are then used to infer the parameters of substitution and the insertion- and deletion process.
Traceability calculation: The in-silico evolution of the seed protein using the simulation software REvolver, and the determination of the traceability curve.
Visualization: The inference of the traceability index for the protein in 233 species from all domains of life, and the generation of a colored tree. A high resolution PDF of the image is available HERE.

Installation & Usage

Please refer to the protTrace WIKI for a full description of the installation and usage guidlines. The WIKI will also explain how to set up a virtual machine running protTrace. Below, we will provide a quick excerpt.

protTrace is written in Python 2.7, some helper scripts in Perl and R. Find below a the 3rd party software that is required by protTrace:

The ProtTrace package contains scripts written in different languages. In order to run ProtTrace you need the following resources:
Python v2.7.13 or higher. Note, ProtTrace will not run under Python 3
- Install also the DendroPy module (can be done via Conda).
Perl v5 or higher including the following modules
- Getopt::Long
- List::Util
- LWP::Simple
Java v1.7 or higher
R v3 or higher
wget

protTrace & Accessory Software

Program name	Version	Description	Mandatory	BioConda
MAFFT	v6 or higher	Multiple Sequence alignment	yes	yes
NCBI Blast	v2.7 or higher	Sequence similarity based search	yes	yes
HMMER	3.2 or higher	Sequence similarity based search using Hidden Markov Mode	yes	yes
IQTREE	1.6.7.1 or higher	Phylogenetic tree reconstruction	yes	yes
HaMStR OneSeq	v1 or higher	targeted ortholog search	no	no

For the start, we suggest to omit the optional use of HaMStR, since the use of this software comes along with some strict naming conventions.

Once that is out of the way (we suggest to use the conda package management system for this) you can just clone this repository to get a copy of protTrace.

git clone https://github.com/BIONF/protTrace

Configuring protTrace

To configure protTrace simply move into the protTrace directory and run the configure script

perl bin/create_conf.pl -name=prog.conf -getOMA -getPfam

This will check if all dependencies are existing, it will allow you to set all parameters required for the protTrace run, and eventually will download the required data from the OMA database and from the Pfam database. * If you are confident that you have this data already available, you can omit either or both of the options -getOMA and -getPfam. You will then have to tell protTrace via the create_conf.pl script where this data is located. * Make sure to adhere to the formatting requirements for the OMA data, and that you ran hmmpress on the Pfam database.

Once everything is set, you are ready to run protTest

Calling protTest

Enter the protTest directory and type

python bin/protTrace.py -h

this should obtain

USAGE:  protTrace.py -i <omaIdsFile> | -f <fastaSeqsFile> -c <configFile> [-h]
        -i              Text file containing protein OMA ids (1 id per line)
        -f              List of input protein sequences in fasta format
        -c              Configuration file for setting program's dependencies

Input Data

protTest can use either OMA protein ids, or a protein sequence in fasta format as input

In toy_example/ you can find two files, test.ids and test.fasta for performing a test run with protTrace.

We describe the input in the section Test Run of our WIKI.

Test Run

We provide in the directory toy_example two files for testing protTrace

test.ids: This file contains the OMA protein id of a yeast protein DIM1. To run this test:
1. create a config file prot.conf using the create_conf.pl script. We recommend to leave all values as default for the start
2. place the config file into the directory toy_example
3. enter the directory toy_example and run protTrace by typing
```
python ../bin/protTrace.py -i test.id -c prot.conf
```
The output that will be generated by this run is described in the WIKI
test.fasta: This file contains the protein sequence of human ZNT3.
1. create or modify the config file prog.conf using the create_conf.pl script. Make sure to set in the section General Options the entry species to HUMAN
2. place the config file into the directory toy_example
3. enter the directory toy_example and run protTrace by typing
```
python ../bin/protTrace.py -f test.fasta -c prot.conf
```
The output that will be generated by this run is described in the WIKI

WIKI

Read the WIKI to explore the functionality of protTrace.

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

Acknowledgements

We would like to thank the members of Ebersberger group for many valuable suggestions and ...bug reports :)

Contributors

Arpit Jain
Ingo Ebersberger
Dominik Perisa

License

This tool is released under GNU-GPL3.0 license.

How-To Cite

Arpit Jain, Arndt von Haeseler, Ingo Ebersberger The evolutionary Traceability of protein (2018) BioRxiv

Contact

Ingo Ebersberger ebersberger@bio.uni-frankfurt.de

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
bin		bin
toy_example		toy_example
used_files		used_files
INSTALL_LINUX.txt		INSTALL_LINUX.txt
README.md		README.md
Workflow-ProtTrace.v1.cap.png		Workflow-ProtTrace.v1.cap.png
protTrace_Manual.pdf		protTrace_Manual.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

protTrace - A simulation based framework to estimate the evolutionary traceability of protein.

Table of Contents

Scientific context

Workflow

Installation & Usage

protTrace & Accessory Software

Configuring protTrace

Calling protTest

Input Data

Test Run

WIKI

Bugs

Acknowledgements

Contributors

License

How-To Cite

Contact

About

Releases

Packages

Contributors 4

Languages

BIONF/protTrace

Folders and files

Latest commit

History

Repository files navigation

protTrace - A simulation based framework to estimate the evolutionary traceability of protein.

Table of Contents

Scientific context

Workflow

Installation & Usage

protTrace & Accessory Software

Configuring protTrace

Calling protTest

Input Data

Test Run

WIKI

Bugs

Acknowledgements

Contributors

License

How-To Cite

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages