TCGA-clasterization

This repo contains jupyter notebook that shows part of my work done for my bachelor thesis. Dataset data and more sophisticated description can be found at:

https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq

Short description

The data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD. Each of the 801 rows describes genome profile of a particular patient. Conducted analysis aim was to answer the question of how well unsupervised learning alghoritms could sepereate different types of cancer within the dataset or are there any other clusters within or between different kinds of cancer.

How to run this

Simply click on this link!

Tools used

Python 3.7
Scikit-Learn library
Pandas
Matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
TCGA-clasterization.ipynb		TCGA-clasterization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCGA-clasterization

Short description

How to run this

Tools used

About

Languages

adrian-aleks/TCGA-clasterization

Folders and files

Latest commit

History

Repository files navigation

TCGA-clasterization

Short description

How to run this

Tools used

About

Topics

Resources

Stars

Watchers

Forks

Languages