Exploring Data Augmentation Methods on Social Media Corpora

Authors: Isabel Garcia Pietri, Kineret Stanley

Summary

In this project we explore data augmentation techniques in the context of text classification using two social media datasets. We explore popular varieties of data augmentation, starting with oversampling, Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrich et al., 2015). We also consider Greyscaling, a relatively unexplored data augmentation technique that seeks to mitigate the intensity of adjectives in examples. Finally, we consider a few-shot learning approach: Pattern-Exploiting Training (PET) (Schick et al., 2020). For the experiments we use a BERT transformer architecture.

Files Description

notebooks/RtGender-Notebooks/ directory: contains the experiments with the RtGender dataset
notebooks/TRAC-2-notebooks/ directory: contains the experiments with the TRAC-2 dataset
papers directory: related papers

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
TRAC-2-for-Grayscaling		TRAC-2-for-Grayscaling
notebooks		notebooks
papers		papers
pet-methodology-config-files		pet-methodology-config-files
.gitignore		.gitignore
Exploring Data Augmentation Methods on Social Media Corpora.pdf		Exploring Data Augmentation Methods on Social Media Corpora.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Data Augmentation Methods on Social Media Corpora

Summary

Files Description

About

Releases

Packages

Contributors 2

Languages

ipietri/w266_Final_Project

Folders and files

Latest commit

History

Repository files navigation

Exploring Data Augmentation Methods on Social Media Corpora

Summary

Files Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages