- What is this?
- What is alk3n3?
- How can I use this data, and where can I find it?
- Analysis and visualization programs
- Setting up the right environment
- Computing environment
- Contributors
- Lead Contacts
This repository contains coding scripts utilized for the analysis performed in the "Single cell resolution analysis of the human pancreatic ductal progenitor cell niche" publication (Qadir/Alvarez-Cubela et. al, 2020). The purpose of providing the code here is to allow for transparency and robust data-analysis reproducibility. Most of the steps used for data analysis and visualization have been optimised for an average computing environment (for the year 2019). Some analyses however, require a high-performace computing environment (see computing environment). The methodology has already been described extensively in the manuscript. However, this analysis relies heavily on powerful scRNAseq analysis algorithms developed by the Satija lab, namely Seurat (Butler et al., 2018: Nature Biotechnology; Stuart et al., 2018: Cell) (for a complete list of dependencies and code utilized see analysis & visualization programs).
This data is derived from the scRNAseq of ALK3Bright+ cells isolated from n = 3 independant human donor pancreas. These n = 3 files have been compiled into 1 dataset reffered to as the alk3n3 dataset.
Data files utilized in this analysis have been deposited in the Gene Expression Omnibus (GEO), gene expression data repository at the NIH. Data are part of the GSE131886 high-thoroughput sequencing repository and can be found here. Data files have been renamed allowing for sample-origin information to be incorporated. Supplementary files contain Cellranger output files, which have been renamed to ensure clarity. Change file names (of filtered information) to 'matrix.mtx.gz', 'barcodes.tsv.gz' and 'features.tsv.gz' after seperating files into donor specific folders. This is necessary, to allow Seurat to read these files. Seruat cannot read files named in their current form. Please note, GSE raw data files are freely available for public download on GEO. If you would like further files such as seurat objects to download, please email the project leader for requests.
We povide raw FASTQ files generated from single-cell cDNA libraries sequenced by the Illumina sequencing platform, along with unfiltered post-alignment count files generated by the Cellranger v3.0.1 software. In addition we also provide a gene expression matrix containing data on filtered gene counts across our dataset. We also utilize the GSE81076 (Grun et. al, 2016: Cell Stem Cell) and GSE85241 (Muraro et. al, 2016: Cell Systems) datasets. These two datasets represent the human pancreas atlas for single-cell gene transcription. We use these datasets for our integrated single-cell analysis. Files used for our analysis can be found as part of the Seurat Integration tutorial for pancreatic datasets which can be found here.
These are sequencing reads generated by the Illumina sequencing platform. Files contain raw reads and sequencing efficiency information. These are the input files for the Cellranger software.
This contains data outputs of Cellranger v3.0.1, which was run using default settings. Code used to analyze data is a part of this repository. This data contains filtered/unfiltered count files for gene expression across barcodes/cells.
Preliminary data-analyses involving n=3 de-identified human exocrine-pancreata derived ALK3+ cells are included in this file. This includes data thresholding, normalization, subsetting, linear dimensionality reduction (PCA), non-linear multimodal dimensionality reduction (PCA/UMAP), clustering, and data visualization.
In order to understand where our cells map against other pancretic cells, we mapped our cells against a human pancreas dataset for single-cell gene transcription. In doing so, we were able to understand where our cells reside in the context of other neighbouring pancreatic cells. We hope to use our data to expand the human pancreas single-cell transcriptional profile. These data were downloaded from a pre-analyzed gene expression matrix created as part of Seurat tutorial, which can be found and downloaded from here.
- Install R
- Install Rstudio
- You are ready to rumble in the jungle.
- Once you have installed R and RStudio, copy script from the first file in the Pancreas_ductal_scRNAseq/R_analysis_by_experiment/ folder. After this run and install all dependencies, and then load these packages allowing for analysis.
- If you need help understanding how commands are run in R use the [ctrl + enter] command or please visit here.
- If you run into problems, please open a new issue, you can do this by going to 'issues' and clicking on the 'new issue' icon. We will help you replicate our analysis! Do not fear single cell analysis!
- Processor: Intel Sandy Bridge E5-2670 (16cores x 16 threads)
- RAM: 25GB
- OS: CentOS 6.5
- Hardware integrated into the Pegasus Supercomputing array at the University of Miami
- Processor: Intel Core i7-6700 CPU (4cores x 8threads)
- RAM: 32GB DDR3
- OS: Windows 10 Enterprise (x64 bit)
- Processor: Intel Core Xeon E5-1620 CPU (4cores x 8threads)
- RAM: 16GB DDR3
- OS: Windows 10 Proffesional (x64 bit)
- Processor: Intel Core i7 8750H CPU (6cores x 12threads)
- RAM: 8GB DDR4
- OS: Windows 10 Proffesional (x64 bit)
Under construction
Qadir, M.M.F., Alvarez-Cubela, S., Klein, D., Van Dijk, J., Anquela, R.M., Lanzoni, G., Sadiq, S., Moreno-Hernandez, Y.B., Navarro-Rubio, B., Garcia, M.T., Diaz, A., Johnson, K., Sant, D., Ricordi, C., Griswold, T., Pastori, R.L., Dominguez-bendala, J. (2020) Proceedings of the National Academy of Sciences. Single cell resolution analysis of the human pancreatic ductal progenitor cell niche. Apr 2020, 201918314; DOI: 10.1073/pnas.1918314117
- Mirza Muhammad Fahd Qadir - Github - University of Miami - to contact please Email
- The JDB Lab - Github - Diabetes Research Institute, UM
- Muhammad Saad Sadiq - Github - University of Miami - to contact please Email
- Dr. Tony Griswold PhD. - Github - University of Miami - to contact please Email
- Dr. Dave Sant PhD. - University of Utah - to contact please Email
- Dr. Juan Dominguez-Bendala PhD. - Diabetes Research Institute, UM - to contact please Email
- Dr. Ricardo Pastori PhD. - Diabetes Research Institute, UM - to contact please Email
- Diabetes Research Institute Foundation (DRIF)
- The Inserra family
- The Fred and Mabel R. Parks Foundation
- The Tonkinson Foundation
- ADA Grant #1-19-ICTS-078
- NIH Grant #1R43DK105655-01
- NIH Grant #2R44DK105655-02
- NIH/NIDDK HIRN Grant #U01DK120393 (These studies are part of this grant)
- IIE Fulbright