Single Voice Training and Synthesizing using WaveNet

Generate raw audio waveforms using WaveNet, a deep neural network.

Dependencies

Dataset

Three MP3 files with a total of 80 minutes of poetry spoken by Sylvia Plath

Pre-processing Dataset

For this project, each main MP3 file was considered a chapter. Create a folder for each chapter and add it to the sounds directory:

sounds/
- CHAPTERNAME/
  - CHAPTERNAME-CLIPNUMBER.mp3

Training was done using short sequences, so I used Audacity to break the ~30 minute MP3 files into smaller clips using the following steps:

Select Analyze..Sound Finder with the following settings to create labels
- Treat audio below this level as silence [-dB] = 26.0
- Minimum duration of silence between sounds [seconds] = .250
- Label starting point = .1
- Label ending point = .1
Select Edit...Preferences Import/Export and turn off "Show Metadata Editor prior to export step"
Select File...Export Multiple with the following settings
- Format: MP3
- Numbering after File name prefix: CHAPTERNAME

MP4 video on using Audacity to split into clips

Place the MP3 clips under sounds/CHAPTERNAME. Listen to clips below 30k in file size, and delete any clips that are silent.

Training

Hyperparameters

The batch size is set to 2 for a 2GB GPU. It should be increased if you have more GPU memory.

Execute

python wavenet-plath.py

Monitor Training

Execute

tensorboard --logdir=. to view loss chart and audio synthesis

Synthesizing

Generate a 10-second wav file using the trained model.

import wavenet_plath
wavenet_plath.synthesize()

Listen to an MP3 or view an MP4 Video generated after 529000 steps of training. WaveNet was trained without the text sequences, so the generated speech is gibberish!

Author

Author: @pkmital
Contributor: @hollygrimm

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
chart_loss_52900_2018-03-29.png		chart_loss_52900_2018-03-29.png
plath1-443.mp3		plath1-443.mp3
split_mp3_clips.mp4		split_mp3_clips.mp4
synthesis_52900_2018-03-29.mp3		synthesis_52900_2018-03-29.mp3
synthesis_52900_2018-03-29.mp4		synthesis_52900_2018-03-29.mp4
wavenet_plath.py		wavenet_plath.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single Voice Training and Synthesizing using WaveNet

Dependencies

Dataset

Pre-processing Dataset

Training

Hyperparameters

Monitor Training

Synthesizing

Author

About

Releases

Packages

Languages

hollygrimm/wavenet-plath

Folders and files

Latest commit

History

Repository files navigation

Single Voice Training and Synthesizing using WaveNet

Dependencies

Dataset

Pre-processing Dataset

Training

Hyperparameters

Monitor Training

Synthesizing

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages