azalea/VITS-fast-fine-tuning

Fork 0

T

Plachta f735bc3d80 upload files

2023-02-16 19:54:14 +08:00

.idea

upload files

2023-02-13 14:17:54 +08:00

configs

upload files

2023-02-16 18:01:33 +08:00

monotonic_align

Delete core.c

2023-02-11 08:25:01 +08:00

text

Add files via upload

2023-02-11 08:22:58 +08:00

user_voice

upload files

2023-02-16 19:40:47 +08:00

attentions.py

Add files via upload

2023-02-11 08:22:58 +08:00

commons.py

upload files

2023-02-13 17:20:19 +08:00

data_utils.py

upload files

2023-02-13 17:20:19 +08:00

demucs_denoise.py

upload files

2023-02-16 15:56:03 +08:00

download_model.py

upload files

2023-02-16 18:29:06 +08:00

finetune_speaker.py

upload files

2023-02-16 17:57:41 +08:00

LICENSE

Initial commit

2023-02-11 08:14:59 +08:00

losses.py

Add files via upload

2023-02-11 08:22:58 +08:00

mel_processing.py

Add files via upload

2023-02-11 08:22:58 +08:00

models_infer.py

upload files

2023-02-15 16:18:49 +08:00

models.py

upload files

2023-02-13 17:20:19 +08:00

modules.py

upload files

2023-02-13 17:20:19 +08:00

preprocess.py

upload files

2023-02-16 15:56:03 +08:00

README_EN.md

upload files

2023-02-16 19:46:23 +08:00

README_ZH.md

upload files

2023-02-16 19:46:23 +08:00

README.md

upload files

2023-02-16 19:46:23 +08:00

requirements_infer.txt

upload files

2023-02-15 16:18:49 +08:00

requirements.txt

upload files

2023-02-16 17:57:41 +08:00

transforms.py

Add files via upload

2023-02-11 08:22:58 +08:00

user_voice_collect.py

upload files

2023-02-16 19:54:14 +08:00

utils.py

upload files

2023-02-16 17:41:23 +08:00

VC_inference.py

upload files

2023-02-16 18:33:06 +08:00

voice_upload.py

upload files

2023-02-16 16:39:34 +08:00

whisper_transcribe.py

upload files

2023-02-16 17:41:23 +08:00

README_EN.md

中文文档请点击这里

VITS Voice Conversion

This repo will guide you to add your voice into an existing VITS TTS model to make it a high-quality voice converter to all existing character voices in the model.

Welcome to play around with the base model, a Trilingual Anime VITS!

Currently Supported Tasks:

Convert user's voice to characters listed here
Chinese, English, Japanese TTS with user's voice
Chinese, English, Japanese TTS with custom characters!

Currently Supported Characters for TTS & VC:

Umamusume Pretty Derby (Used as base model pretraining)
Sanoba Witch (Used as base model pretraining)
Genshin Impact (Used as base model pretraining)
Any character you wish as long as you have their voices!

Fine-tuning

It's recommended to perform fine-tuning on Google Colab because the original VITS has some dependencies that are difficult to configure.

How long does it take?

Install dependencies (2 min)
Record at least 20 your own voice (5~10 min)
Upload your character voices, which should be a .zip file, it's file structure should be like:

Your-zip-file.zip
├───Character_name_1
├   ├───xxx.wav
├   ├───...
├   ├───yyy.mp3
├   └───zzz.wav
├───Character_name_2
├   ├───xxx.wav
├   ├───...
├   ├───yyy.mp3
├   └───zzz.wav
├───...
├
└───Character_name_n
    ├───xxx.wav
    ├───...
    ├───yyy.mp3
    └───zzz.wav

Note that the format & name of the audio files does not matter as long as they are audio files.
Audio quality requirements: >=2s, <=20s per audio, background noise should be as less as possible. Audio quantity requirements: at least 10 per character, better if 20+ per character.
You can either choose to perform step 2, 3, or both, depending on your needs. 4. Fine-tune (30 min)
After everything is done, download the fine-tuned model & model config

Inference or Usage (Currently support Windows only)

Remember to download your fine-tuned model!
Download the latest release
Put your model & config file into the folder inference, make sure to rename the model to G_latest.pth and config file to finetune_speaker.json
The file structure should be as follows:

inference
├───inference.exe
├───...
├───finetune_speaker.json
└───G_latest.json

run inference.exe, the browser should pop up automatically.