This repo contains jupyter notebook that shows part of my work done for my bachelor thesis. Dataset data and more sophisticated description can be found at:
The data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD. Each of the 801 rows describes genome profile of a particular patient. Conducted analysis aim was to answer the question of how well unsupervised learning alghoritms could sepereate different types of cancer within the dataset or are there any other clusters within or between different kinds of cancer.
Simply click on this link!
- Python 3.7
- Scikit-Learn library
- Pandas
- Matplotlib