Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282

tango4j · 2024-11-14T08:51:55Z

What does this PR do ?

Sortformer Diarizer Model, 4 speaker limit, v1

Sortformer Paper Link

In this PR, we are adding: model files, module files and corresponding dataloader and evalutations.

Collection: ASR/speaker_tasks

Changelog

model files
nemo/collections/asr/models/sortformer_diar_models.py
module files
nemo/collections/asr/modules/sortformer_modules.py
evaluation files
nemo/collections/asr/metrics/der.py
nemo/collections/asr/metrics/multi_binary_acc.py
dataloader files
NeMo/nemo/collections/asr/data/audio_to_diar_label.py
NeMo/nemo/collections/asr/data/audio_to_diar_label_lhotse.py
training yaml
examples/speaker_tasks/diarization/conf/neural_diarizer/sortformer_diarizer_hybrid_loss_4spk-v1.yaml
post-processing yaml files
NeMo/examples/speaker_tasks/diarization/conf/post_processing/sortformer_diar_4spk-v1_callhome-part1.yaml
NeMo/examples/speaker_tasks/diarization/conf/post_processing/sortformer_diar_4spk-v1_dihard-dev.yaml
NeMo/nemo/collections/asr/data/audio_to_diar_label.py
NeMo/nemo/collections/asr/data/audio_to_diar_label_lhotse.py
util files
NeMo/nemo/collections/asr/parts/utils/speaker_utils.py
NeMo/nemo/collections/asr/parts/utils/vad_utils.py

*Changed the file names of these yaml files

examples/speaker_tasks/diarization/neural_diarizer/sortformer_diar_train.py
nemo/collections/asr/data/audio_to_diar_label.py

nemo/collections/asr/models/init.py

nemo/collections/asr/modules/sortformer_modules.py
nemo/collections/asr/parts/utils/asr_multispeaker_utils.py
nemo/collections/asr/parts/utils/speaker_utils.py
nemo/collections/asr/parts/utils/vad_utils.py
nemo/collections/common/parts/preprocessing/collections.py

Usage

You can potentially add a usage example below

python ${NEMO_ROOT}/examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py \
     model_path=/path/to/diar_sortformer_4spk-v1.nemo \
     dataset_manifest=/path/to/eval_dataset.json

GitHub Actions CI

CI tests will be added in the second PR.
Third PR will include documents and tutorials.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the ASR and speaker_tasks

Signed-off-by: taejinp <tango4j@gmail.com>

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

stevehuang52 · 2024-11-14T16:19:37Z

.../speaker_tasks/diarization/conf/neural_diarizer/sortformer_diarizer_hybrid_loss_4spk-v1.yaml

+    num_spks: ${model.max_num_of_spks}
+    session_len_sec: ${model.session_len_sec}
+    soft_label_thres: 0.5
+    soft_targets: False


Maybe add some short explanation for soft_label_thres and soft_targets?

Added some comments

stevehuang52 · 2024-11-14T16:21:32Z

.../speaker_tasks/diarization/conf/neural_diarizer/sortformer_diarizer_hybrid_loss_4spk-v1.yaml

+    frame_splicing: 1
+    dither: 0.00001
+
+  sortformer_modules:


Does sortformer_modules mean that we can have several different components under this section? If not, maybe just use sortformer_module to align with other fields (e.g., encoder). Also the SortformerModules name could get rid of the s in my opinion.

Ivan and I decided to put every Sortformer modules (trainable weights or functions for streaming) so thats why it is plural with "s".
It should be actually "SortformerAuxilaryModules" to be more precise, but for brevity it is "sortformer_modules"

stevehuang52 · 2024-11-14T16:31:17Z

nemo/collections/asr/data/audio_to_diar_label.py

+):
+    """
+    Convert subsegment timestamps to scale timestamps by multiplying with the feature rate and rounding.
+    All `ts` related tensors are dimensioned as (N, 2), where N is the number of subsegments.


Maybe add a simple example to showcase what this function does? I'm a bit confused about the relation between segment and subsegment

Yes this could be very confusing, because for end-to-end models, subsegments are equivalent to frames.

Modular models (vad + clustering, etc) : subsegments (usually 0.5~1.0s) in speech segments (usually 2 s ~ 15 s)
End-to-end models: frames in a session audio (the whole session length is a segment)

Added the definitions, and examples to clarify this.

Signed-off-by: taejinp <tango4j@gmail.com>

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py

nemo/collections/asr/data/audio_to_diar_label.py

nemo/collections/asr/models/sortformer_diar_models.py

nemo/collections/asr/parts/utils/asr_multispeaker_utils.py

nemo/collections/asr/parts/utils/speaker_utils.py

nemo/collections/asr/parts/utils/vad_utils.py

nemo/collections/common/parts/preprocessing/collections.py

Signed-off-by: taejinp <tango4j@gmail.com>

…to sortformer/pr_01

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

…to sortformer/pr_01

tango4j · 2024-11-15T02:12:22Z

.../speaker_tasks/diarization/conf/neural_diarizer/sortformer_diarizer_hybrid_loss_4spk-v1.yaml

@@ -0,0 +1,213 @@
+sortformer_diarizer_hybrid_loss_4spk-v1.yaml# Sortformer Diarizer is an end-to-end speaker diarization model that is solely based on Transformer-encoder type of architecture.


There is some sneezed text pasted here.

Fixed. removed

tango4j · 2024-11-15T02:16:35Z

examples/speaker_tasks/diarization/neural_diarizer/sortformer_diar_train.py

+from nemo.utils import logging
+from nemo.utils.exp_manager import exp_manager
+
+"""


This example is NOT fixed. Fix this.

updated this.

tango4j · 2024-11-15T02:17:52Z

nemo/collections/common/parts/preprocessing/collections.py

@@ -1245,6 +1243,187 @@ def __parse_item_rttm(self, line: str, manifest_file: str) -> Dict[str, Any]:
        return item


+class EndtoEndDiarizationLabel(_Collection):
+    """List of diarization audio-label correspondence with preprocessing."""


These oneliner docstrings are not updated when copied from the original source. Update it.

Fixed and updated

tango4j · 2024-11-15T02:18:11Z

nemo/collections/common/parts/preprocessing/collections.py

+
+
+class EndtoEndDiarizationSpeechLabel(EndtoEndDiarizationLabel):
+    """`DiarizationLabel` diarization data sample collector from structured json files."""


Not updated, and out of context. Fix it.

Signed-off-by: taejinp <tango4j@gmail.com>

…to sortformer/pr_01

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

nemo/collections/asr/data/audio_to_diar_label.py

Signed-off-by: taejinp <tango4j@gmail.com>

…to sortformer/pr_01

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Signed-off-by: taejinp <tango4j@gmail.com>

…to sortformer/pr_01

Signed-off-by: taejinp <tango4j@gmail.com>

…to sortformer/pr_01

Signed-off-by: taejinp <tango4j@gmail.com>

github-actions · 2024-11-16T02:19:10Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

github-actions · 2024-11-16T02:19:17Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

tango4j added 3 commits November 13, 2024 18:46

Adding the first pr files models and dataset

e69ec8e

Signed-off-by: taejinp <tango4j@gmail.com>

Tested all unit-test files

2914325

Signed-off-by: taejinp <tango4j@gmail.com>

Name changes on yaml files and train example

9a468ac

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j requested review from nithinraok, stevehuang52 and weiqingw4ng November 14, 2024 08:51

github-actions bot added ASR Speaker Tasks common labels Nov 14, 2024

github-advanced-security bot found potential problems Nov 14, 2024

View reviewed changes

Merge branch 'main' into sortformer/pr_01

a910d30

tango4j marked this pull request as ready for review November 14, 2024 09:07

Apply isort and black reformatting

2f44fe1

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

stevehuang52 reviewed Nov 14, 2024

View reviewed changes

tango4j and others added 3 commits November 14, 2024 16:56

Reflecting comments and removing unnecessary parts for this PR

4ddc59b

Signed-off-by: taejinp <tango4j@gmail.com>

Resolved conflicts

43d95f0

Signed-off-by: taejinp <tango4j@gmail.com>

Apply isort and black reformatting

40e9f95

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

github-advanced-security bot found potential problems Nov 15, 2024

View reviewed changes

tango4j and others added 7 commits November 14, 2024 17:53

Adding docstrings to reflect the PR comments

f7f84bb

Signed-off-by: taejinp <tango4j@gmail.com>

Resolved the new conflict

95acd79

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'main' into sortformer/pr_01

919f4da

removed the unused find_first_nonzero

4134e25

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

d3432e5

…to sortformer/pr_01

Apply isort and black reformatting

5dd4d4c

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

ca5eea3

…to sortformer/pr_01

tango4j commented Nov 15, 2024

View reviewed changes

tango4j and others added 4 commits November 15, 2024 12:30

Merge branch 'main' into sortformer/pr_01

9d493c0

Fixed all pylint issues

037f61e

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

a8bc048

…to sortformer/pr_01

Apply isort and black reformatting

cb23268

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

github-advanced-security bot found potential problems Nov 15, 2024

View reviewed changes

nemo/collections/asr/data/audio_to_diar_label.py Fixed Show resolved Hide resolved

tango4j and others added 11 commits November 15, 2024 14:49

Resolving pylint issues

4a266b9

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

5e4e9c8

…to sortformer/pr_01

Merge branch 'main' into sortformer/pr_01

c31c60c

Apply isort and black reformatting

6e2225e

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Removing unused varialbe in audio_to_diar_label.py

ab93b17

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

4f3ee66

…to sortformer/pr_01

Merge branch 'main' into sortformer/pr_01

3f24b82

Merge branch 'main' into sortformer/pr_01

f49e107

Fixed docstrings in training script

7dea01b

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'sortformer/pr_01' of https://github.com/tango4j/NeMo in…

2a99d53

…to sortformer/pr_01

Line-too-long issue from Pylint fixed

71d515f

Signed-off-by: taejinp <tango4j@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282

Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282

tango4j commented Nov 14, 2024 •

edited

Loading

github-advanced-security bot left a comment

stevehuang52 Nov 14, 2024

tango4j Nov 14, 2024

stevehuang52 Nov 14, 2024 •

edited

Loading

tango4j Nov 14, 2024

stevehuang52 Nov 14, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

tango4j Nov 15, 2024

github-actions bot commented Nov 16, 2024

github-actions bot commented Nov 16, 2024

		@@ -0,0 +1,213 @@
		sortformer_diarizer_hybrid_loss_4spk-v1.yaml# Sortformer Diarizer is an end-to-end speaker diarization model that is solely based on Transformer-encoder type of architecture.



		class EndtoEndDiarizationSpeechLabel(EndtoEndDiarizationLabel):
		"""`DiarizationLabel` diarization data sample collector from structured json files."""

Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282

Are you sure you want to change the base?

Sortformer Diarizer 4spk v1 model PR Part 1: models, modules and dataloaders #11282

Conversation

tango4j commented Nov 14, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevehuang52 Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 16, 2024

github-actions bot commented Nov 16, 2024

tango4j commented Nov 14, 2024 •

edited

Loading

stevehuang52 Nov 14, 2024 •

edited

Loading