SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Park, Daniel S. and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D. and Le, Quoc V. Interspeech 2019 [Paper]
This repository contains a implementation of the augmentation methodology proposed in the above paper.
- python3
- librosa
- libsndfile
- audioread
- ffmpeg
- numpy
- tensorflow
- tensorflow_addons
main.py [--dir][--policy]
--dir | path/to/dataset | default='./LibriSpeech/'
--policy | augmentation policy to use from {'LB','LD', 'SS', 'SM'} | deafault='LD'
OR
refer to demo/demo.ipynb for jupyter notebook demo
- @article{Park_2019, title={SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition}, url={http://dx.doi.org/10.21437/Interspeech.2019-2680}, DOI={10.21437/interspeech.2019-2680}, journal={Interspeech 2019}, publisher={ISCA}, author={Park, Daniel S. and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D. and Le, Quoc V.}, year={2019}, month={Sep} }