Week 1 - Introduction To Text As Data
- What is a 'Document Frequency Matrix' and what is it used for?
- Exploring best practices when pre-processing text data: tf-idf, tokenization, stop-words, stemming, etc.
Week 3 - Getting Data and APIs
- Exploring the use of 4 APIs: NYT, Steam, Facebook and Twitter. Used NYT API to preform a sentiment analysis on 6-months-worth of headlines via pytorch.
- Exploring regex and various models.
Week 5 - Dictionary-Based Methods
- Sentiment Analysis, Hyper-Parameter Tuning, Training Dataset Selection
- Exploring topic models
Week 7 - Supervised Learning Models
Week 8 - Unsupervised Learning/Clustering
Week 9 - Supervised Classification Models