VITS Voice Conversion
This repo will guide you to add your voice into an existing VITS TTS model to make it a high-quality voice converter to all existing character voices in the model.
Welcome to play around with the base model, a Trilingual Anime VITS!
Currently Supported Tasks:
- Convert user's voice to characters listed here
- Chinese, English, Japanese TTS with user's voice
- Chinese, English, Japanese TTS with custom characters!
Currently Supported Characters for TTS & VC:
- Umamusume Pretty Derby (Used as base model pretraining)
- Sanoba Witch (Used as base model pretraining)
- Genshin Impact (Used as base model pretraining)
- Any character you wish as long as you have their voices!
Fine-tuning
It's recommended to perform fine-tuning on Google Colab because the original VITS has some dependencies that are difficult to configure.
How long does it take?
- Install dependencies (2 min)
- Record at least 20 your own voice (5~10 min)
- Upload your character voices, which should be a
.zipfile, it's file structure should be like:
Your-zip-file.zip
├───Character_name_1
├ ├───xxx.wav
├ ├───...
├ ├───yyy.mp3
├ └───zzz.wav
├───Character_name_2
├ ├───xxx.wav
├ ├───...
├ ├───yyy.mp3
├ └───zzz.wav
├───...
├
└───Character_name_n
├───xxx.wav
├───...
├───yyy.mp3
└───zzz.wav
Note that the format & name of the audio files does not matter as long as they are audio files.
Audio quality requirements: >=2s, <=20s per audio, background noise should be as less as possible.
Audio quantity requirements: at least 10 per character, better if 20+ per character.
You can either choose to perform step 2, 3, or both, depending on your needs.
4. Fine-tune (30 min)
After everything is done, download the fine-tuned model & model config
Inference or Usage (Currently support Windows only)
- Remember to download your fine-tuned model!
- Download the latest release
- Put your model & config file into the folder
inference, make sure to rename the model toG_latest.pthand config file tofinetune_speaker.json - The file structure should be as follows:
inference
├───inference.exe
├───...
├───finetune_speaker.json
└───G_latest.json
- run
inference.exe, the browser should pop up automatically.