Agglomerative Clustering Analysis on gene expression dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq
Broadly, the following steps have been performed in this solution notebook:

Minimal preprocessing on the dataset
Explained wide usage of Agglomerative clustering over Divisive Clustering
Visualization of given class labels using TSNE
Ran agglomerative clustering using the following linkages {single, complete, group average, minimum variance}.
- Compared the clustering performance both visually and empirically on the dataset.
- Reported the best results on various cluster validity indices.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
RNA_Seq_PANCAN_dataset.ipynb		RNA_Seq_PANCAN_dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agglomerative Clustering Analysis on gene expression dataset

About

Releases

Packages

Languages

havelhakimi/gene-expression

Folders and files

Latest commit

History

Repository files navigation

Agglomerative Clustering Analysis on gene expression dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages