Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Q) Multi/Single Speaker different language finetune #282

Open
mantrakp04 opened this issue Sep 5, 2024 · 7 comments
Open

(Q) Multi/Single Speaker different language finetune #282

mantrakp04 opened this issue Sep 5, 2024 · 7 comments

Comments

@mantrakp04
Copy link

I was wondering if anyone was successful with finetuning a styletts2 base model on a different language eg. French, Spanish, etc... and achieved good results, if so someone kind enough to share some outputs, training approach and dataset info (total duration, total speakers, language)

@Respaired
Copy link

yeah, a few individuals such as yours truly have done it.

@mantrakp04
Copy link
Author

can you share some details on the model, language, duration, etc

@Respaired
Copy link

japanese on a 21hrs dataset, single speaker.

@Karesto
Copy link

Karesto commented Sep 12, 2024

I did multiple tries:

Around 30 hours of data.
French. (Mono speaker for now).

It is however, not a finetune, if you're training on a new language, i highly recommend restarting from scratch the training, albeit being very slow.

@mantrakp04
Copy link
Author

mantrakp04 commented Sep 12, 2024 via email

@Karesto
Copy link

Karesto commented Sep 12, 2024

I used 2 x4090s. it took about 2 weeks (with a few stops for bugs here and there)

@martinambrus
Copy link

I would also recommend looking here for some pointers on multi-lingual StyleTTS2 creation: #257
It has some interesting points, especially about BERT and AuxiliaryASR models training for languages different from English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants