-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
num_spks: ${model.max_num_of_spks} | ||
session_len_sec: ${model.session_len_sec} | ||
soft_label_thres: 0.5 | ||
soft_targets: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some short explanation for soft_label_thres
and soft_targets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments
frame_splicing: 1 | ||
dither: 0.00001 | ||
|
||
sortformer_modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sortformer_modules
mean that we can have several different components under this section? If not, maybe just use sortformer_module
to align with other fields (e.g., encoder
). Also the SortformerModules
name could get rid of the s
in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivan and I decided to put every Sortformer modules (trainable weights or functions for streaming) so thats why it is plural with "s".
It should be actually "SortformerAuxilaryModules" to be more precise, but for brevity it is "sortformer_modules"
): | ||
""" | ||
Convert subsegment timestamps to scale timestamps by multiplying with the feature rate and rounding. | ||
All `ts` related tensors are dimensioned as (N, 2), where N is the number of subsegments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a simple example to showcase what this function does? I'm a bit confused about the relation between segment and subsegment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this could be very confusing, because for end-to-end models, subsegments are equivalent to frames.
Modular models (vad + clustering, etc) : subsegments (usually 0.5~1.0s) in speech segments (usually 2 s ~ 15 s)
End-to-end models: frames in a session audio (the whole session length is a segment)
Added the definitions, and examples to clarify this.
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py
Fixed
Show resolved
Hide resolved
examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py
Fixed
Show resolved
Hide resolved
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
…to sortformer/pr_01
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
…to sortformer/pr_01
@@ -0,0 +1,213 @@ | |||
sortformer_diarizer_hybrid_loss_4spk-v1.yaml# Sortformer Diarizer is an end-to-end speaker diarization model that is solely based on Transformer-encoder type of architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some sneezed text pasted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
from nemo.utils import logging | ||
from nemo.utils.exp_manager import exp_manager | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is NOT fixed. Fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this.
@@ -1245,6 +1243,187 @@ def __parse_item_rttm(self, line: str, manifest_file: str) -> Dict[str, Any]: | |||
return item | |||
|
|||
|
|||
class EndtoEndDiarizationLabel(_Collection): | |||
"""List of diarization audio-label correspondence with preprocessing.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These oneliner docstrings are not updated when copied from the original source. Update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and updated
|
||
|
||
class EndtoEndDiarizationSpeechLabel(EndtoEndDiarizationLabel): | ||
"""`DiarizationLabel` diarization data sample collector from structured json files.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not updated, and out of context. Fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: taejinp <tango4j@gmail.com>
…to sortformer/pr_01
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
…to sortformer/pr_01
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
…to sortformer/pr_01
Signed-off-by: taejinp <tango4j@gmail.com>
…to sortformer/pr_01
Signed-off-by: taejinp <tango4j@gmail.com>
beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base. Your code was analyzed with PyLint. The following annotations have been identified:
Thank you for improving NeMo's documentation! |
beep boop 🤖: 🚨 The following files must be fixed before merge! Your code was analyzed with PyLint. The following annotations have been identified:
Thank you for improving NeMo's documentation! |
What does this PR do ?
Sortformer Diarizer Model, 4 speaker limit, v1
Sortformer Paper Link
In this PR, we are adding: model files, module files and corresponding dataloader and evalutations.
Collection: ASR/speaker_tasks
Changelog
model files
nemo/collections/asr/models/sortformer_diar_models.py
module files
nemo/collections/asr/modules/sortformer_modules.py
evaluation files
nemo/collections/asr/metrics/der.py
nemo/collections/asr/metrics/multi_binary_acc.py
dataloader files
NeMo/nemo/collections/asr/data/audio_to_diar_label.py
NeMo/nemo/collections/asr/data/audio_to_diar_label_lhotse.py
training yaml
examples/speaker_tasks/diarization/conf/neural_diarizer/sortformer_diarizer_hybrid_loss_4spk-v1.yaml
post-processing yaml files
NeMo/examples/speaker_tasks/diarization/conf/post_processing/sortformer_diar_4spk-v1_callhome-part1.yaml
NeMo/examples/speaker_tasks/diarization/conf/post_processing/sortformer_diar_4spk-v1_dihard-dev.yaml
NeMo/nemo/collections/asr/data/audio_to_diar_label.py
NeMo/nemo/collections/asr/data/audio_to_diar_label_lhotse.py
util files
NeMo/nemo/collections/asr/parts/utils/speaker_utils.py
NeMo/nemo/collections/asr/parts/utils/vad_utils.py
*Changed the file names of these yaml files
examples/speaker_tasks/diarization/neural_diarizer/sortformer_diar_train.py
nemo/collections/asr/data/audio_to_diar_label.py
nemo/collections/asr/models/init.py
nemo/collections/asr/modules/sortformer_modules.py
nemo/collections/asr/parts/utils/asr_multispeaker_utils.py
nemo/collections/asr/parts/utils/speaker_utils.py
nemo/collections/asr/parts/utils/vad_utils.py
nemo/collections/common/parts/preprocessing/collections.py
Usage
python ${NEMO_ROOT}/examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py \ model_path=/path/to/diar_sortformer_4spk-v1.nemo \ dataset_manifest=/path/to/eval_dataset.json
GitHub Actions CI
CI tests will be added in the second PR.
Third PR will include documents and tutorials.
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the ASR and speaker_tasks