Using ASR to obtain syllables, matching text from lyrics, and generating JSON for Minlabel preloading.
-
Install
- Asr model from: https://github.com/RapidAI/RapidASR
-
Install
rapid_paraformer
(Chinese)pip install -r requirements.txt pip install rapid_paraformer
-
Download **resources.zip ** (Google Drive | 百度网盘)
resources ├── [ 700] config.yaml └── [4.0K] models ├── [ 11K] am.mvn ├── [824M] asr_paraformerv2.onnx └── [ 50K] token_list.pkl
-
Install
requirements
Japanese (optional)pip install -r requirements_jp.txt
-
Collect lyric
-
Collect the original lyrics text and place it in the Lyric folder. The content is pure lyrics, and the file name is consistent with the audio before the AudioSlicer slicing (i.e. the part before the file name '_' after slicing)
lyric ├── chuanqi.txt ├── caocao.txt └── ...
-
Place the cut file fragments in the wav folder. Unify the file name with the previous lyrics: [lyricName]_ xxx.wav.
If there are multiple '_' in the file name, Take the far right as the dividing line. The file name in the left half must be the same as the lyrics file name in the previous step.
wav ├── caocao_001.wav ├── caocao_002.wav └── ...
-
Run rapid_asr.py obtains the lab results of asr.
python rapid_asr.py --model_config resources/config.yaml --wav_folder wav_folder --lab_folder lab_folder Option: --model_config str sample:resources/config.yaml Download from: https://github.com/RapidAI/RapidASR/blob/main/python/README.md --wav_folder str Sliced wav file folder (*.wav). --lab_folder str Folder for outputting lab files.
-
Run match_lyric.py obtains JSON and put it in the annotation folder of Minlabel.
python match_lyric.py --lyric_folder lyric --lab_folder lab_folder --json_folder json_folder --asr_rectify True Option: --lyric_folder str The file name corresponds to the lab prefix (before \'_\'), only pure lyrics are allowed (*.txt). --lab_folder str Chinese characters or pinyin separated by spaces obtained from ASR (*.lab). --json_folder str Folder for outputting JSON files. --diff_threshold int Only display different results with n words or more. --asr_rectify bool Trust the result of ASR (if the result of ASR hits another candidate pronunciation of a polyphonic character, it is considered a g2p error). --syllable_neglect bool Ignore syllable errors with similar pronunciations and refer to the Near_systolic.yaml file. --consonant_neglect bool Ignore consonant errors with similar pronunciations and refer to the Near_consonant.yaml file. --vowel_neglect bool vowel errors with similar pronunciations and refer to the Near_vowel.yaml file.
-