Skip to content
/ iptoolkit Public

A pythonic library for analysing immunopeptidomic experiments

License

Notifications You must be signed in to change notification settings

ikmb/iptoolkit

Repository files navigation

The immunopeptidomic toolkit library, IPTK

IKMB_LOGO

install with bioconda

PyPI version

Introduction and Project Aim

IPTK is a Python library specialized in the analysis of HLA-peptidomes identified through an Immunopeptidomic(IP) pipeline. The library provides a high level API for analyzing and visualizing the identified peptides, integrating transcritomics and protein structure information for a rich analysis of the identified immunopeptidomes. It also provides a toolbox for integrating and comparing different experiments and/or different runs.

Installation

Installation With pip

The library can be installed using pip as follows:

pip install iptkl --user

Notes and common troubleshooting

1- Please make sure that pip is installed on the system.

2- For macOS users please make sure Xcode is installed. This can be done using the following command

xcode-select --install 

For Debian/Ubuntu users, please make sure build-essential is installed.

sudo apt install build-essential

3- make sure you run the dashboard with python>=3.6, using conda or pip as follow

pip install python==3.6

Dependencies

The library requires the following libraries to be installed in order to function properly:

numpy, pandas, biopython, seaborn, matplotlib, plotly, mhcnames, pyteomics, h5py, logomaker, colour, lxml, nglview, sklearn, scipy

Usually these packages are installed automatically through pip. However, incase this process failed, the dependencies can be installed as follows:

pip install -r requirements.txt 

Installation with BioConda

1. Install bioconda from the official website here

2. Create a new environment

conda create -n iptk_env

3. Active the new environment

conda active iptkl_env

4. Instal iptk from the bioconda chanel

conda install -c bioconda iptkl 

Notes and common troubleshooting

Conda found conflicts and can not install the library

1. install mamba from here

2. In the same environment install iptkl

mamba install -c bioconda iptkl 

3. If mamba was able to install the library, then we are fine, else we go one step further

3.a. create a new conda environment

conda create -n iptk_env

3.b. install the library using pip

pip install ptkl

4. Enjoy analyzing your data ..

Visualization

1. Incase you are working within a Jupyter Notebooks, you can set the magic command %matplotlib notebook to work interactively with the generated plots. However, if you are working on an IPython shell, please add the magic command %matplotlib to work.

2. To save, any of the figures generated using matplotlib or seaborn, use the following command:

fig.savefig('my_figure_name.my_extension', dpi=600)

3. To visualize any of the figures generated using plotly library, use either:

fig.show()

which will open the figure in the browser for interactive visualization and the generated figure can then be saved from there. A second option is to save the figure directly in python, as follows

fig.write_image('my_figure_name.my_extension')

4. To work interactively using Plotly based figures inside Jupyter Notebooks:

A. install chart studio as follows:

pip install chart_studio 

B. embed the generated plotly figure using the function chart_studio.plotly.iplot as shown in tutorial 2 and 4.

Get Started!

The library has four notebooks that provide a step-by-step guidance to use the library and to utilize its major APIs for interacting with an IPs data. These tutorials can be found at the Tutorial directory

IPTK has been documented using Sphinx, the manual of the library can be found at the docs directory and online at readthedocs

Contact

Please feel free to write an email to the developer at h.elabd@ikmb.uni-kiel.de or to open an issue here incase of a bug or a required feature.

Running the tutorials

To run the tutorials locally, run the following steps:

1- Download the tutorials, either by cloning the repository or by downloading the tutorials along with the associated datasets only.

2- Start the notebook by running jupyter-notebook from the terminal.

Running the dashboard

To start the dashboard:

1. First, install IPTK, in case you have not already, as follows

pip install iptkl --user

2. Install Dash and other dependencies using, as follows

pip install dash dash_bootstrap_components  dash-uploader 

3. Making the app executable, as follows

chmod +x Apps/ExperimentUI.py 

4. Launch the App, as follows

./ExperimentUI.py  

5. Open the app in the browser by typing the IP: http://127.0.0.1:8050/

6. Getting starting !!!

A simple test case can be found at the test_data directory,

6.a For the identification file drag the file: 0810202_0.5_all_ids_merged_psm_perc_filtered.idXML

6.b select idXML from the Format drop-down menu

6.c For the Fasta Database, drag the file: human_proteome.fasta

6.d Click Create Experiment, wait a few second and enjoy analyzing the data

FAQs

How can I cite IPTK ?

IPTK has been recently published in BMC Bioinformatics here

How can I access the raw data used for developing IPTK ?

The raw data has been deployed to PRIDE and can be accessed here

How does IPTK map HLA types to the identified peptides?

The class Experiment is utilized to link or connect different aspects of an experiment together, for example, the transcriptomic layer with the list of identified peptides. In case of HLA information, it is assumed that all peptides identified in an experiments are coming from the same pool of HLA molecules, e.g. HLA-DRB1*15:01 and HLA-DRB1*13:01, incase HLA-DR specific antibody has been used for the pulldown of HLA proteins or up to 6 HLA-I alleles, incase HLA pan specific antibodies have been used. The Experiment class is initialized with an HLASet instance that stores information about HLA types, i.e. HLA types are linked with the list of identified peptides using the experiment class. Finally, the function compute_protein_coverage defined in the AnalysisFunction module will be used to compare protein coverage among different experiments.

What the difference between source code inside library directory and lib_exp_acc?

The code inside library uses, contains stable IPTK code, meanwhile, the code inside lib_exp_acc contain experimental code, which one one hand, might contain features not included inside the stable version of the library, on the other hand, it is an experimental code so it might change tremendously between different pushes. For users of IPTK we highly recommend looking up the source code under the library directory, as it contain the source code inside the PyPi and Conda packages. On the other hand, IPTK, developers are highly recommended to work with the code inside lib_exp_acc?

Where can I get IPTK?

IPTK can be downloaded from PyPi and Conda as described below:

  1. Pip based installation
pip install iptkl --user
  1. Conda based installation
conda install -c bioconda iptkl

Release 0.6 notice

Version 0.6 brings major upgrades to the library and introduce a wide array of function and classes for automating and accelerating IPTK performance

1- IPTK can now parse and work with mzIdentML files using the function parse_mzIdentML_to_identification_table define in the IO module of the library

2- IPTK can now process and read mzML files directly using PyOpenMS

3- IPTK has an improved function executional speed thanks to the AcceleratedFunctions module in the Analysis module which provides an acceleration using Numba

4- Current release also introduce, the Wrappers module which provide a simple abstraction for creating Experiment and ExperimentSet

5- Introducing ReplicatedExperiments which provides a simple API for creating experiments obtained from replicates

6- IPTK, current support concurrent execution, the wrapper submodules, now utilizes multiprocessing for parsing and reading multiple datasets on-parallel

7- Introducing, chordDiagram for showing overlap among experiments and Proband of experiments

Release 0.6.6 notice

1- Introducing GOEngine class which provides an easy-to-use wrapper around goatools for performing GOEA on the identified proteins.

2- current release supports Jaccard index as a metric of similarity among experiments

3- Introducing support for visualizing GOEA results

4- correction of minor bugs and documentation typos in previous releases

The road to version 1.0

The major plan is to, first, increase and enhance IPTK scale and execution speed by offloading computational intensive tasks to RUST. Second, increase automation by providing custom analysis recipes for performing commonly used routines. Third, provide an API for integrating other omics layers, namely metabolomics and proteomics. Finally, adding support to PTM modified HLA peptides and proteins

Planned features for 0.7.* Release

1. Release 0.7.1 will aim at supporting the integration of Proteomic data with the library

2. Release 0.7.2 will aim at supporting the integration of Metabolomics data with the library

3. Release 0.7.3 will aim at standardizing all omics API and provide a high-level abstraction for working with them

4. Release 0.7.4-0.7.7 will aim at re-implement all the class in Rust and provide a python wrapper around these classes, Thus ensuring fast and concurrent execution

Planned features for 0.8.*

1. Release 0.8.1-0.8.4 will aim at re-implement all IPTK parsers in Rust and provide a python binder to it

2. Release 0.8.5-0.8.8 will aim at re-implementing all analysis function using Rust

Planned features for 0.9.*

Different minor releases will introduce different analysis Recipes to automate analysis tasks

Planning for version 1.0.0

IPTK version 1.0 is release on PyPi and on BioConda

Previous versions Release notice

Release 0.5 notice:

1- Adding a class to query AFND database for allele frequency world-wide.

2- Adding function for plotting a choropleth for allele frequencies.

3- Adding classes for working directly with mzML files using pyopenMS framework

4- An experimental class that act as database interface and provide method for storing and querying immunopeptidomic data

Release 0.4.11 notice:

Adding more control to the function plot_MDS_from_ic_coverage to fine-tune its behavior, for example, by controlling the random seed.

Release 0.4.10 notice:

Corrected a bug in the Experiment class to correctly compute the length of peptides containing parentheses. This bug caused the len function to return the number of characters in the sequence instead of the number of amino acids.

Release 0.4.8 notice:

Corrected a bug in the Peptide class to manage peptides containing parentheses in the sequence. This bug caused the len function to return the number of characters in the sequence instead of the number of amino acids.

Release 0.4.7 notice:

Minor corrections in the visualization module

Release 0.4.6 notice:

Minor corrections in the documentation and the default values for some parameters in the visualization functions

Release 0.4.0 notice:

1- Adding function to compute immunopeptiomic coverage matrix

2- Introducing MDS plots for comparing the similarities between runs based on immunopeptidomic coverage

Funding

The project was funded by the German Research Foundation (DFG) (Research Training Group 1743, ‘Genes, Environment and Inflammation’)

IKMB_LOGO