Skip to content

Classying music genre based on audio features and lyrics (using the Million Song Dataset).

Notifications You must be signed in to change notification settings

katkordas/stidh20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifying music genre with classical machine learning - Special Topics in Digital Humanities 2019/2020 project

This repository contains files submitted to the Special Topics in Digital Humanities 2019/2020 (ST) course at Leiden University.

Author: Kat Kołodziejczyk.

Data

The data used in this project come from the Million Song Dataset. See documentation on the website for a detailed description of the audio features used. Lyrics (in bag-of-words format) come from the musiXmatch Dataset.

Contents

The repository contains the following files:

  • comb_dataset.csv: csv file with audio features and lyrics per song (50000 most frequent lyrics (rows from "i" to "kad"): frequency of word in given song, 0 if not present)
  • Audio feature names: txt file with the name of the 49 audio features used in the classification
  • Classifier: The main Python code
  • STIDH report: Report I wrote for class

The folder "Results" contains the following output from classifier.py:

  • distribution_genre.png: Bar plot showing the genre distribution of comb_dataset.csv
  • all_pred_results.txt: Accuracy scores (in %) for Gaussian Naive Bayes (GNB), Linear Support Vector Classification (Linear SVC) and k-Nearest Neighbours (KNN) for audio features and lyrics
  • audio_best_predictors.txt: Accuracy scores for audio features for GNB, Linear SVC, and KNN for 26, 25, 20, 15 and 10 best predictors
  • audio_results.png: Bar plot showing the accuracy scores for GNB, Linear SVC, and KNN for 26 best audio predictors
  • lyrics_best_predictors.txt: Accuracy scores for lyrics features for 284, 250, 200 and 100 best predictors
  • lyrics_results.png: Bar plot showing the accuracy scores for GNB, Linear SVC, and KNN for 250 best lyrics predictors

About

Classying music genre based on audio features and lyrics (using the Million Song Dataset).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages