TatarTTS is an open-source text-to-speech dataset for the Tatar language. The dataset comprises ~70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female).
TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
We employed Piper text-to-speech system to train TTS models on our dataset.
sudo apt-get install python3-dev
git clone https://github.com/rhasspy/piper.git
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -e .
Please check the installation guide for more information.
LINK TO DOWNLOAD WILL BE AVAILABLE SOON HERE. After downloading the dataset, unzip it inside piper/src/python/
directory. The dataset is in the ljspeech format.
TatarTTS
|-male
|-wav
|0.wav
|1.wav
|2.wav
...
|-metadata.csv
|-female
|-wav
|0.wav
|1.wav
|2.wav
...
|-metadata.csv
cd piper/src/python
mkdir TatarTTS_piper
cd TatarTTS_piper
mkdir male female
python3 -m piper_train.preprocess \
--language tt \
--input-dir /TatarTTS/male \
--output-dir /TatarTTS_piper/male \
--dataset-format ljspeech \
--single-speaker \
--sample-rate 22050
python3 -m piper_train.preprocess \
--language tt \
--input-dir /TatarTTS/female \
--output-dir /TatarTTS_piper/female \
--dataset-format ljspeech \
--single-speaker \
--sample-rate 22050
cd piper/src/python
python3 -m piper_train \
--dataset-dir /TatarTTS_piper/male\
--accelerator 'gpu' \
--devices 1 \
--batch-size 32 \
--validation-split 0.0 \
--num-test-examples 0 \
--max_epochs 1000 \
--checkpoint-epochs 1 \
--precision 32
python3 -m piper_train \
--dataset-dir /TatarTTS_piper/female\
--accelerator 'gpu' \
--devices 1 \
--batch-size 32 \
--validation-split 0.0 \
--num-test-examples 0 \
--max_epochs 1000 \
--checkpoint-epochs 1 \
--precision 32
python3 -m piper_train.export_onnx \
/path/to/model.ckpt \
/path/to/model.onnx
cp /path/to/training_dir/config.json \
/path/to/model.onnx.json
Download and unzip pre-trained models (.onnx, .ckpt) for both speakers from Google Drive.
cd models
echo 'Аның чыраенда тәвәккәллек чагыла иде.' | ./piper --model male/male.onnx --config male/config.json --output_file welcome.wav
echo 'Аның чыраенда тәвәккәллек чагыла иде.' | ./piper --model female/female.onnx --config female/config.json --output_file welcome.wav
cd piper/src/python_run
python3 piper --model /path/to/model/.onnx --config /path/to/model/config.json --output-file welcome.wav
The project has been developed in academic collaboration between ISSAI and Institute of Applied Semiotics of Tatarstan Academy of Sciences
@INPROCEEDINGS{10463261,
author={Orel, Daniil and Kuzdeuov, Askat and Gilmullin, Rinat and Khakimov, Bulat and Varol, Huseyin Atakan},
booktitle={2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},
title={TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language},
year={2024},
volume={},
number={},
pages={717-721},
doi={10.1109/ICAIIC60209.2024.10463261}}
- Piper: https://github.com/rhasspy/piper
- Pre-processing, training, and exporting: https://github.com/rhasspy/piper/blob/master/TRAINING.md