Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-latin text is not aligned #1046

Closed
beviah opened this issue Apr 27, 2023 · 4 comments
Closed

non-latin text is not aligned #1046

beviah opened this issue Apr 27, 2023 · 4 comments

Comments

@beviah
Copy link

beviah commented Apr 27, 2023

i.e. arabic, russian, etc.

text field in response has valid non-latin transcript, yet there are no alignments.

@pzelasko
Copy link
Collaborator

Can you provide more context?

@beviah
Copy link
Author

beviah commented Apr 27, 2023

from lhotse import CutSet, RecordingSet, align_with_torchaudio, annotate_with_whisper
recordings = RecordingSet.from_dir(fld, pattern="*.wav")
cuts = annotate_with_whisper(recordings, device='cuda', language='ar')
cuts_aligned = align_with_torchaudio(cuts)
for cut in cuts_aligned:
    alignments = cut.supervisions

word alignments are always empty for texts of non-latin scripts, as if no text was detected.

@pzelasko
Copy link
Collaborator

You’d need an ASR model that supports your target language. You can check if there’s one available in torchaudio: https://pytorch.org/audio/stable/pipelines.html

Otherwise you’d need to train or fine tune your own. You can also try word alignments from faster whisper (unmerged yet in #1017).

@beviah
Copy link
Author

beviah commented Apr 28, 2023

Awesome! it works, thanks!

@beviah beviah closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants