Skip to content

Commit

Permalink
Merge pull request #1 from HallemLab/development
Browse files Browse the repository at this point in the history
Updating with S. stercoralis free-living male data
  • Loading branch information
astrasb authored Nov 4, 2021
2 parents d800398 + eff7b05 commit d886c2f
Show file tree
Hide file tree
Showing 46 changed files with 66,908 additions and 127 deletions.
Binary file not shown.
Binary file modified Analysis/Data/SsRNAseq_data_preprocessed
Binary file not shown.
Binary file modified Analysis/Data/Ss_vDGEList
Binary file not shown.
Binary file modified Analysis/Outputs/Ss_Benchmarking.pdf
Binary file not shown.
Binary file added Analysis/Outputs/Ss_Benchmarking_FLM.pdf
Binary file not shown.
Binary file modified Analysis/Outputs/Ss_CountComparisons.pdf
Binary file not shown.
Binary file modified Analysis/Outputs/Ss_Multivariate_Plots_PCA.pdf
Binary file not shown.
Binary file modified Analysis/Outputs/Ss_Multivariate_Plots_Small_Multiples.pdf
Binary file not shown.
Binary file modified Analysis/Outputs/Ss_PCid_GSEA.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion Analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The `limma` package ( [Ritchie *et al* 2015](https://pubmed.ncbi.nlm.nih.gov/256
## Benchmarking
Next, we benchmark the results of our differential gene expression analysis against a published analsysis:

* *S. stercoralis*: We use Supplementary Table 5 from [Hunt *et al* 2016](https://www.nature.com/articles/ng.3495), which includes the results of edgeR differential gene expression analysis between multiple life stages.
* *S. stercoralis*: We use Supplementary Table 5 from [Hunt *et al* 2016](https://www.nature.com/articles/ng.3495), which includes the results of edgeR differential gene expression analysis between multiple life stages. We also include Supplementary Dataset 3 from [Gonzalez Akimori *et al* 2021](https://www.nature.com/articles/s41598-021-87478-3), which includes the results of edgeR differential gene expression analysis between free-living males and free-living females.
* *S. ratti*: We use Supplementary Table 5 from [Hunt *et al* 2016](https://www.nature.com/articles/ng.3495), which includes the results of edgeR differential gene expression analysis between multiple life stages.
* *S. papillosus*: Tables with data suitable for benchmarking are not currently available.
* *S. venezuelensis*: We use Supplementary Table 1 from [Hunt *et al* 2018](https://www.nature.com/articles/s41598-018-23514-z), which includes the results of edgeR differential gene expression analysis between free living females and parasitic females. Note that we manually adjust the geneID names in Table S1 in order to match current Wormbase ParaSite nomenclature, such that 'SVE_' is used instead of 'SVEN_'.
Expand Down
101 changes: 83 additions & 18 deletions Analysis/Scripts/Ss_RNAseq_Data_Analysis.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
---
title: Offline Data Analysis of *Strongyloides stercoralis* bulk RNA-seq data
author: "Astra S. Bryant, PhD"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
html_document:
code_folding: hide
Expand All @@ -15,10 +17,13 @@ output:
A full inforuction to the project and a description of Kallisto alignment and data filtering/normalization steps can be found in `Ss_RNAseq_Data_Preprocessing.rmd`.

## Data Analysis
Principal component analysis applied to filtered and TMM-normalized expression data identified two developmental clusters that account for 42.6% and 28.9% of expression variability between *S. stercoralis* developmental stages.
Principal component analysis applied to filtered and TMM-normalized expression data identified two developmental clusters that account for 34.5% and 26.3% of expression variability between *S. stercoralis* developmental stages.

The limma package ( [Ritchie *et al* 2015](https://pubmed.ncbi.nlm.nih.gov/25605792/), [Phipson *et al* 2016](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/)) is used to conduct pairwise differential gene expression analyses between life stages. An example pairwise comparisons between two life stages is displayed as a volcano plot and interactive DataTable.

## Update Notes
This code was updated on approximately 11-3-21 by A.S.B., adding *S. stercoralis* free-living male RNA-seq expression data.

# Code
All code is echoed under descriptive headers; code chunks are hidden from view by default. Users may show hidden R code by clicking the Show buttons. In addition, all code chunks are collated at the end of the document in an Appendix.

Expand Down Expand Up @@ -156,14 +161,12 @@ p3<-ggplot(pca.res.df) +
) +
geom_point(size=4, color = "black") +
scale_colour_manual(values =
c("#68C3A6", "#8DA0CA", "#A69CCC",
"#E38CBB", "#A6D056", "#FED82F",
"#E4C494"),
c("#68C3A6", "#F68D64", "#8DA0CA", "#A69CCC",
"#E38CBB", "#A6D056", "#FED82F", "#E4C494"),
aesthetics = c("colour", "fill")) +
scale_shape_manual(values =
c(21, 21, 22,
23, 21, 22,
23)) +
c(21, 22, 21, 22,
23, 21, 22, 23)) +
xlab(paste0("PC1 (",pc.per[1],"%",")")) +
ylab(paste0("PC2 (",pc.per[2],"%",")")) +
labs(title="Principal Components Analysis of S. stercoralis RNA-seq Samples",
Expand Down Expand Up @@ -222,9 +225,8 @@ p6<-ggplot(pca.pivot) +
aes(x=sample, y=loadings) + # you could iteratively 'paint' different covariates onto this plot using the 'fill' aes
geom_bar(stat="identity") +
scale_fill_manual(values =
c("#68C3A6", "#8DA0CA", "#A69CCC",
"#E38CBB", "#A6D056", "#FED82F",
"#E4C494")) +
c("#68C3A6", "#F68D64", "#8DA0CA", "#A69CCC",
"#E38CBB", "#A6D056", "#FED82F", "#E4C494")) +
facet_wrap(~PC, labeller = labeller(PC = PC.labs)) +
geom_bar(data = PC1, stat = "identity", aes(fill = group)) +
geom_bar(data = PC2, stat = "identity", aes(fill = group)) +
Expand Down Expand Up @@ -375,7 +377,7 @@ fit <- lmFit(v.DEGList.filtered.norm, design)
# iL3s vs FLF
# Note that the target/contrast goups will be divided by the number of life
# stage groups e.g. PF+FLF/2 - iL3+iL3a+pfL1+ppL1+ppL3/5
comparison <- c('(iL3)-(FLF)')
comparison <- c('(iL3)-(FLF)', '(FLM)-(FLF)')
targetStage<- comparison %>%
str_split(pattern="-", simplify = T) %>%
Expand Down Expand Up @@ -417,7 +419,7 @@ ebFit <- eBayes(fits)
# Pull out the DEGs that pass a specific threshold for all pairwise comparisons ----
# Adjust for multiple comparisons using method = global.
results <- decideTests(ebFit, method="global", adjust.method="BH", p.value = adj.P.thresh)
results <- decideTests(ebFit, method="separate", adjust.method="BH", p.value = adj.P.thresh)
recode01<- function(x){
case_when(x == 1 ~ "Up",
Expand Down Expand Up @@ -631,8 +633,8 @@ LS.datatable
```

## Benchmarking
This section compares potential results to a published analsysis. Here, we use Supplementary Table 5 from [Hunt *et al* 2016](https://www.nature.com/articles/ng.3495), which includes the results of edgeR differential gene expression analysis between multiple life stages.
```{r benchmarking}
This section compares potential results to published analsyses. Here, we use Supplementary Table 5 from [Hunt *et al* 2016](https://www.nature.com/articles/ng.3495), which includes the results of edgeR differential gene expression analysis between multiple life stages. We also include Supplementary Dataset 3 from [Gonzalez Akimori *et al* 2021](https://www.nature.com/articles/s41598-021-87478-3), which includes the results of edgeR differential gene expression analysis between free-living males and free-living females.
```{r benchmarkingHunt}
suppressPackageStartupMessages({
library(openxlsx)
Expand Down Expand Up @@ -729,9 +731,8 @@ p.cpm_all <- ggplot(cpm.all) +
fill = "grey") +
geom_violin(trim = FALSE, show.legend = T, alpha = 0.7) +
scale_fill_manual(values =
c("#68C3A6", "#A69CCC","#8DA0CA",
"#A6D056", "#FED82F",
"#E4C494","#E38CBB" )) +
c("#68C3A6", "#F68D64", "#8DA0CA", "#A69CCC",
"#E38CBB", "#A6D056", "#FED82F", "#E4C494")) +
stat_summary(data = cpm.BrowserOnly,
fun = "median",
geom = "point",
Expand Down Expand Up @@ -787,6 +788,70 @@ as_tibble(res.aov3$`condition:life_stage`, rownames = "comparison") %>%
unite(comparison, temp1, temp2, sep = "-")
```

### Comparing LogFC between Browser and Gonzalez Akimori Dataset

```{r benchmarkingGonzalezAkimori}
suppressPackageStartupMessages({
library(openxlsx)
library(tidyverse)
library(ggplot2)
})
# Load Gonzalez Akimori Dataset: FLM vs FLF comparison
temp.dat <- read.xlsx ("../Data/Benchmarking/41598_2021_87478_MOESM3_ESM.xlsx",
sheet = 1, startRow = 1)
GonzalezAkimori.dat <- tibble(geneID = temp.dat$genes, logFC = temp.dat$log2.Fold.Change)%>%
mutate(geneID = str_remove(geneID, "gene:"))
rm(temp.dat)
# Rename Results of FLM vs FLF comparison from Browser
Browser.dat <- list.myTopHits.df$`(FLM)-(FLF)` %>%
dplyr::select(geneID, logFC)
print(paste('Total number of genes in Gonzalez Akimori *et al* 2021 FLM vs FLF comparison tab:',nrow(GonzalezAkimori.dat)))
print(paste('Total number of genes in Str-RNA-seq Browser FLM vs FLF output file:', nrow(Browser.dat)))
plotting.all <- inner_join(Browser.dat, GonzalezAkimori.dat, by = "geneID")
plotting.BrowserOnly <- anti_join(Browser.dat, GonzalezAkimori.dat, by = "geneID")
# The plot below takes the genes with LogFC results in both the Browser and Gonzalez Akimori databases, and plots the two sets against each other.
linearMod <- lm(logFC.y ~ logFC.x, data = plotting.all) %>%
summary()
p.benchmarkFLM <- ggplot(plotting.all, aes(x = logFC.x, y = logFC.y)) +
geom_smooth(method=lm, color = 'red', formula = "y ~ x") +
geom_point(shape=16, size=3, alpha = 0.8) +
labs(title = "S. stercoralis: Str-Browser vs Gonzalez Akimori Data",
subtitle = "FLM vs FLF",
caption = paste("points = genes; red line/shading = linear regression \n",
"w/ 95% confidence regions (formula = y ~ x). \n",
"Adj R-squared =",
round(linearMod$adj.r.squared,3)),
x = "Str-Browser LogFC",
y = "Gonzalez Akimori LogFC") +
coord_equal() +
theme_bw() +
theme(text = element_text(size = 10),
title = element_text(size = 10))
print("Linear regression of Browser vs Gonzalez Akimori LogFC results:")
(linearMod)
suppressMessages(ggsave("Ss_Benchmarking_FLM.pdf",
plot = p.benchmarkFLM,
device = "pdf",
height = 4,
#width = 8,
path = output.path))
```

``` {r}
p.benchmarkFLM
```

<!-- # Cluster DEGs into functional modules --- -->
```{r heatmaps, eval=FALSE, include=FALSE}
Expand Down Expand Up @@ -844,7 +909,7 @@ clustColumns <- hclust(as.dist(1-cor(diffGenes.thresh,
#Cut the resulting tree and create color vector for clusters.
module.assign <- stats::cutree(clustRows, k=14) #The diffGenes info is
based on a pairwise comparison between all 7 life stages.
#based on a pairwise comparison between all life stages.
# assign a color to each module (makes it easy to identify and manipulate)
module.color <- rainbow(length(unique(module.assign)), start=0.1, end=0.9)
Expand Down
285 changes: 209 additions & 76 deletions Analysis/Scripts/Ss_RNAseq_Data_Analysis.html

Large diffs are not rendered by default.

8,275 changes: 8,275 additions & 0 deletions Analysis/Scripts/Ss_RNAseq_Data_Analysis_original.html

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions Preprocessing/Data/S_stercoralis/Reads/SRR13343624.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 13,098
[index] number of k-mers: 17,193,469
[index] number of equivalence classes: 17,415
[quant] running in paired-end mode
[quant] will process pair 1: SRR13343624_1.fastq.gz
SRR13343624_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 44,616,003 reads, 42,200,714 reads pseudoaligned
[quant] estimated average fragment length: 170.112
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 488 rounds

Loading

0 comments on commit d886c2f

Please sign in to comment.