This a repository for data from the manuscript "Improved prediction of site-specific evolutionary rates from structure by averaging across homologs"
The repo contains:
- The python script to calculate site-rates from ddG values, and an example command line
- Example command line for calculating ddG values from a protein structure
- Data from the manuscript
TMS calculations requires the script calc_rates.py, which calls RateCalculator/RateCalculator.py
scripts/calc_rates.py --ranks ranks_xtal/1pga.txt --fasta fasta/1pga.fasta --score_style Rosetta
--ranks
specifies the ddG values in a list with the format
pos | amino acid type | ddG relative | wt amino acid yes no?
1 A -4.033 0
1 D 0.2 1
1 E -4.502 0
. . .
--fasta
specifies a path to a fasta file for the sequence (the script will use this and not the seqeuence in the rank file)
--score_style
specfies the type of data that is provided. 'Rosetta' is ddGs predicted by Rosetta (or other software) while 'Experimental' is experimental data.
if other sofware and energy functions are used for ddG calculations are used, the slope parameter (R*T factor in the Boltzmann equation) may need to be adjusted.
A version of this program that takes a energy offset parameter, used for RosettaEvolve, is found in the repo for RosettaEvolve.
In this study we used a version of the Rosetta ddG prediction protocol to calculate the effect of mutations on the stability of proteins. The method is accessible through Rosetta Scripts as part of RosettaEvolve. More documentation on RosettaEvolve is found in the article Atomistic simulation of protein evolution reveals sequence covariation and time-dependent fluctuations of site-specific substitution rates and the corresponding repo. An example command line is provided below for calculation of rank files used in this study is shown below, with the input files found in the repo.
rosetta_scripts.default.linuxgccrelease -s $pdb -parser:protocol protocols/measure_ranks.xml @flags
Evolutionary trajectories were simulated using RosettaEvolve, which is distributed as part of the Rosetta macromolecular modeling package. Instructions on how to run RosettaEvolve can be found at the repo for the RosettaEvolve manuscript describing the method Atomistic simulation of protein evolution reveals sequence covariation and time-dependent fluctuations of site-specific substitution rates.