Generate raw audio waveforms using WaveNet, a deep neural network.
Three MP3 files with a total of 80 minutes of poetry spoken by Sylvia Plath
For this project, each main MP3 file was considered a chapter. Create a folder for each chapter and add it to the sounds directory:
- sounds/
- CHAPTERNAME/
- CHAPTERNAME-CLIPNUMBER.mp3
- CHAPTERNAME/
Training was done using short sequences, so I used Audacity to break the ~30 minute MP3 files into smaller clips using the following steps:
- Select Analyze..Sound Finder with the following settings to create labels
- Treat audio below this level as silence [-dB] = 26.0
- Minimum duration of silence between sounds [seconds] = .250
- Label starting point = .1
- Label ending point = .1
- Select Edit...Preferences Import/Export and turn off "Show Metadata Editor prior to export step"
- Select File...Export Multiple with the following settings
- Format: MP3
- Numbering after File name prefix: CHAPTERNAME
MP4 video on using Audacity to split into clips
Place the MP3 clips under sounds/CHAPTERNAME. Listen to clips below 30k in file size, and delete any clips that are silent.
The batch size is set to 2 for a 2GB GPU. It should be increased if you have more GPU memory.
Execute
python wavenet-plath.py
Execute
tensorboard --logdir=.
to view loss chart and audio synthesis
Generate a 10-second wav file using the trained model.
import wavenet_plath
wavenet_plath.synthesize()
Listen to an MP3 or view an MP4 Video generated after 529000 steps of training. WaveNet was trained without the text sequences, so the generated speech is gibberish!
- Author: @pkmital
- Contributor: @hollygrimm