tbd_audio_stack

COPYRIGHT(C) 2020 - Transportation, Bots, and Disability Lab - CMU
Code released under MIT.
Contact - Zhi - zhi.tan@ri.cmu.edu

A collection of ROS Packages that handles audio processing from capture to recognition (Utterance). The collection consist of the following packages:

tbd_audio_msgs

This repository consist of ROS Messages used throughout the collections

tbd_audio_capture

Currently this is a republish of audio signal from audio_capture with our own message (tbd_audio_msgs/AudioStamped) which encodes the same data but adds additional information about originating time in the header.

tbd_audio_vad

This package is a wrapper for WebRTCVADPy which conducts voice activity detection on the received stamped audio

tbd_audio_recognition_deepspeech

This package is a wrapper for Mozilla's open source implementation of DeepSpeech. It takes in both the VAD and Stamped audio and publishes a detected utterances.

tbd_amazon_transcribe

This package is a wrapper for Amazon's AWS Transcribe service. It takes in both the VAD and Stamped audio and publishes a detected utterances.

Quick 10-Step Setup Instructions

Install ROS Melodic.

Install these ROS dependencies:

sudo apt install ros-melodic-audio-common*
sudo apt install ros-melodic-audio-capture*

Install Python 3 dependencies:
```
sudo apt install python3-venv
```

Create a new ros workspace and python3 virtual environment.

mkdir catkin_ws && cd catkin_ws
python3 -m venv venv
source vevn/bin/activate

Install the following python3 dependencies into the virtual environment:
```
pip install webrtcvad deepspeech==0.7.4 rospkg empy alloylib
```
Create and navigate to the src directory.
```
mkdir src && cd src
```

Clone the tbd_audio_stack repo into src.

git clone https://github.com/CMU-TBD/tbd_audio_stack.git

Download the correct deepspeech model files.

cd src/tbd_audio_stack/tbd_audio_recognition_deepspeech && mkdir models && cd models
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer

Go back to the workspaces's root directory and build and run your project. Make sure to be in the python3 virtual environment.

cd ~/<path_to_your_workspace>/catkin_ws
catkin build -DPYTHON_VERSION=3
source devel/setup.bash
roslaunch tbd_audio_recognition_deepspeech run_recognition.launch

Every thing sould run correctly, and you should be able to see the text output by running rostopic echo /utterance and speaking into your computers microphone.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
tbd_amazon_transcribe		tbd_amazon_transcribe
tbd_audio_capture		tbd_audio_capture
tbd_audio_msgs		tbd_audio_msgs
tbd_audio_recognition_deepspeech		tbd_audio_recognition_deepspeech
tbd_audio_speech_signal_relay		tbd_audio_speech_signal_relay
tbd_audio_stack		tbd_audio_stack
tbd_audio_vad		tbd_audio_vad
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tbd_audio_stack

tbd_audio_msgs

tbd_audio_capture

tbd_audio_vad

tbd_audio_recognition_deepspeech

tbd_amazon_transcribe

Quick 10-Step Setup Instructions

About

Releases

Packages

Contributors 2

Languages

License

CMU-TBD/tbd_audio_stack

Folders and files

Latest commit

History

Repository files navigation

tbd_audio_stack

tbd_audio_msgs

tbd_audio_capture

tbd_audio_vad

tbd_audio_recognition_deepspeech

tbd_amazon_transcribe

Quick 10-Step Setup Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages