-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.Rmd
195 lines (128 loc) · 7.02 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
output:
github_document:
number_sections: FALSE
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "70%"
)
```
# promor <img src="man/figures/logo.png" align="right" height="200" style="float:right; height:200px;">
### Proteomics Data Analysis and Modeling Tools
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/promor)](https://CRAN.R-project.org/package=promor)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/promor?color=blue)](https://r-pkg.org/pkg/promor)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/promor?color=blue)](https://r-pkg.org/pkg/promor)
[![R-CMD-check](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml)
[![test-coverage](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml)
[![License: LGPL v2.1](https://img.shields.io/badge/License-LGPL_v2.1-blue.svg)](https://www.gnu.org/licenses/lgpl-2.1)
<!-- badges: end -->
* `promor` is a user-friendly, comprehensive R package that combines proteomics
data analysis with machine learning-based modeling.
* `promor` streamlines differential expression analysis of
**label-free quantification (LFQ)** proteomics data and building predictive
models with top protein candidates.
* With `promor` we provide a range of quality control and visualization tools to analyze label-free proteomics data at the
protein level.
* Input files for `promor` are a [proteinGroups.txt ](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt)
file produced by [**MaxQuant**](https://maxquant.org) or a [standard input file](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/st.txt) containing a quantitative matrix of protein intensities and an [expDesign.txt](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt) file containing the experimental design of your proteomics data.
* The standard input file should be a tab-delimited text file. Proteins or protein groups should be indicated by rows and samples by columns. Protein names should be listed in the first column and you may use a column name of your choice for the first column. The remaining sample column names should match the sample names indicated by the mq_label column in the expDesign.txt file.
:rotating_light:**Check out our R Shiny app:** [PROMOR App](https://sgrbnf.shinyapps.io/PROMOR_App/)
___
### Installation
Install the released version from CRAN
``` r
install.packages("promor")
```
Install development version from [GitHub](https://github.com/caranathunge/promor)
``` r
# install devtools, if you haven't already:
install.packages("devtools")
# install promor from github
devtools::install_github("caranathunge/promor")
```
---
### Proteomics data analysis with promor
![promor prot analysis flow chart by caranathunge](./man/figures/promor_ProtAnalysisFlowChart_small.png){width=100%}
*Figure 1. A schematic diagram of suggested workflows for proteomics data analysis with promor.*
#### Example
Here is a minimal working example showing how to identify differentially
expressed proteins between two conditions using `promor` in five simple steps.
We use a previously published data set from [Cox et al. (2014)](https://europepmc.org/article/MED/24942700#id609082) (PRIDE ID: PXD000279).
```{r example, results = 'hide', warning=FALSE, eval = FALSE}
# Load promor
library(promor)
# Create a raw_df object with the files provided in this github account.
raw <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
# Filter out proteins with high levels of missing data in either condition or group
raw_filtered <- filterbygroup_na(raw)
# Impute missing data and create an imp_df object.
imp_df <- impute_na(raw_filtered)
# Normalize data and create a norm_df object
norm_df <- normalize_data(imp_df)
# Perform differential expression analysis and create a fit_df object
fit_df <- find_dep(norm_df)
```
Lets take a look at the results using a volcano plot.
```{r volcanoplot, warning = FALSE, eval=FALSE}
volcano_plot(fit_df, text_size = 5)
```
<center>
![](./man/figures/README-volcanoplot-2.png){width=70%}
</center>
---
### Modeling with promor
![promor flowchart-modeling by caranathunge](./man/figures/promor_ProtModelingFlowChart_small.png){width=100%}
*Figure 2. A schematic diagram of suggested workflows for building predictive models with promor.*
#### Example
The following minimal working example shows you how to use your results from differential expression analysis to build machine learning-based predictive models using `promor`.
We use a previously published data set from [Suvarna et al. (2021)](https://www.frontiersin.org/articles/10.3389/fphys.2021.652799/full#h3) that used differentially expressed proteins between severe and non-severe COVID patients to build models to predict COVID severity.
```{r modeling_example, results = 'hide', warning = FALSE, message = F, eval = FALSE}
# First, let's make a model_df object of top differentially expressed proteins.
# We will be using example fit_df and norm_df objects provided with the package.
covid_model_df <- pre_process(
fit_df = covid_fit_df,
norm_df = covid_norm_df
)
# Next, we split the data into training and test data sets
covid_split_df <- split_data(model_df = covid_model_df)
# Let's train our models using the default list of machine learning algorithms
covid_model_list <- train_models(split_df = covid_split_df)
# We can now use our models to predict the test data
covid_prob_list <- test_models(
model_list = covid_model_list,
split_df = covid_split_df
)
```
Let's make ROC plots to check how the different models performed.
```{r rocplot, warning = FALSE, eval = FALSE, message = F}
roc_plot(
probability_list = covid_prob_list,
split_df = covid_split_df
)
```
<center>
![](./man/figures/README-rocplot-1.png){width=70%}
</center>
---
### Tutorials
You can choose a tutorial from the list below that best fits your experiment
and the structure of your proteomics data.
1. This README file can be accessed from RStudio as follows,
``` r
vignette("intro_to_promor", package = "promor")
```
2. If your data do NOT contain technical replicates:
[promor: No technical replicates](https://caranathunge.github.io/promor/articles/promor_no_techreps.html)
3. If your data contain technical replicates:
[promor: Technical replicates](https://caranathunge.github.io/promor/articles/promor_with_techreps.html)
4. If you would like to use your proteomics data to build predictive models:
[promor: Modeling](https://caranathunge.github.io/promor/articles/promor_for_modeling.html)