Comedic Dynamics - Analyzing Segmentation, Offense and Thematic Patterns in Stand Up Comedy Transcripts 🎭
This project aims to enhance the transparency and safety of stand-up comedy by analyzing thematic patterns and sentiments within comedy specials. Utilizing NLP techniques such as topic modeling and sentiment analysis, we identify dominant themes and polarity scores, helping creators to better frame their content disclaimers.
The project examines stand-up comedy specials, focusing on controversial topics like race, ethnicity, and politics, which often lead to public debates. We aim to understand these thematic patterns and sentiments to propose improved content disclaimers, fostering a safer viewing experience.
Our approach leverages Topic Modeling to uncover hidden thematic structures and Offense Analysis in Comedy to understand potentially harmful effects of humor. Techniques include Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to identify topics and sentiments within comedy transcripts.
- Corpus Extraction: Extracted from "Scraps from the Loft" using BeautifulSoup, covering over 450 transcripts in English.
- Text Preprocessing: Involves cleaning text, language identification, tokenization, and Parts-of-Speech tagging using Python libraries.
- Offense and Subjectivity Analysis: Employing the VADER and TextBlob packages for sentiment and subjectivity analysis.
- Named Entity Recognition: Using SpaCy's NLP model to identify and analyze entities within the transcripts.
- Topic Modeling: Using LDA and NMF models to identify prevalent topics, with a focus on optimizing topic coherence.
The analysis identifies relationships between sentiment polarity and subjectivity, showing different trends among comedians. Topic modeling results in distinct thematic categories such as observational, cultural, and political comedy. NMF showed better performance in topic coherence compared to LDA.
The project effectively maps out the thematic and emotional landscape of stand-up comedy, offering insights that can help tailor content disclaimers.