Biological processes have evolved to intricate systems where proteins act as crucial components, guiding specific pathways. Proteins play a pivotal role in determining molecular mechanisms and cellular responses, making the analysis of protein interaction networks essential for understanding cellular processes, disease mechanisms, and identifying potential therapeutic targets.
In this repository, we focus on the analysis of a Protein-Protein Interaction (PPI) network in humans. The network is represented by a directed interactome file, PathLinker_2018_human-ppi-weighted-cap0_75.txt
. This file encapsulates valuable information about protein interactions, including the UniProt IDs of interacting proteins, interaction confidence scores, and the methods used to identify these interactions. You can find this file in the Dataset folder
- The file represents a directed interactome, where each interaction starts from the tail node to the head node.
- Each line of the file consists of four pieces of information:
- Tail protein node (UniProt ID).
- Head protein node (UniProt ID).
- Interaction confidence (range 0 to 1).
- Method used to identify the interaction.
To begin our analysis, we'll construct the biological network using the NetworkX Python package. This involves utilizing the provided interactome file, "PathLinker_2018_human-ppi-weighted-cap0_75.txt," which represents a directed interactome. Each interaction in the file includes the UniProt IDs of the interacting proteins (nodes), the interaction confidence score (weight), and the method used for identification.
Given two proteins, we aim to find and list the acyclic shortest path(s) between them in a text file. The analysis includes providing the total path score, the weight of each interaction in the path(s), and reporting all available paths.
For a given protein, we'll list all directly connected proteins in a text file. This analysis includes reporting the degree (number of connections) of the selected protein in a separate line. Each connected protein will be provided with its corresponding interaction weight.
When given a set of proteins, we'll draw a histogram to visualize their degree distribution. Additionally, we'll rank these proteins from highly connected to least in a text file, with each line representing a protein and its corresponding degree. This is useful in identifying the Hubs in the network.
This analysis involves providing a conversion map between the protein UniProt ID and its gene name. The script will support conversion for either one protein ID or a set of protein IDs, enabling users to obtain their corresponding gene names.
The final step is to convert the existing graph into an unweighted graph using the adjacency matrix method. The unweighted graph will be saved for further analysis.
The analysis resulted in proving these properties about PPIN:
- Small World Effect
- Scale-free Property
- Transitivity
For a more in-depth understanding of the methodology and analysis, please refer to the notebook and attached paper.
To set up the required environment for the Protein-Protein Interactions Analysis, follow these steps:
-
Clone the Repository:
git clone git@github.com:joyou159/Protein-Protein-Interactions-Analysis-.git cd Protein-Protein-Interactions-Analysis
-
Create a Virtual Environment (Optional but Recommended):
python -m venv venv source venv/bin/activate
-
Install Dependencies:
pip install -r requirements.txt
Your contributions are welcome in unraveling the complexities of protein-protein interaction networks!
This project was supervised by Dr. Ibrahim Youssef, who provided invaluable guidance and expertise throughout such incredible journey as part of bioinformatics course at Cairo University Faculty of Engineering.