Custom voice commands with Teachable Machine #99

SentryCoderDev · 2024-11-03T18:50:50Z

SentryCoderDev
Nov 3, 2024

Hello friends, I was bored again and started watching HOTD (House Of The Dragon ), and this came to my mind: what if I adapt the commands there to the robot, because technically my version has a fire spit XD (what could go wrong)

Here are some updates I'm thinking of making (there's a sample code in the attachment)

-The old code only processes the text it recognizes using Google's voice recognition API and transmits the user's speech directly as text. This is suitable for simple command recognition and content management, but its flexibility may be limited in advanced command and control scenarios.

-The new code offers a more advanced structure with Google's Teachable Machine model generator. It performs the recognition function using a deep learning model specially trained to recognize specific commands from the user's voice with higher accuracy. This can provide better results in customized command recognition.

Also, this model is most likely triggered by the voice files used in voice training, that is, if I use my own voice in training, it will not be triggered even if the same command comes with someone else's voice. Of course, this accuracy level increases in direct proportion to the number of given voice files, keep that in mind.

Command Mode has been added to the new version, and this mode is activated when specific commands are detected. For example, the command mode is activated by using and perceiving the command Ta'anari(a word that means something to me) and then the corresponding behavior is triggered when another command is detected. This structure allows the user to use a more complex set of commands.

In the old code, there is no special command mode, each voice command is processed directly. Therefore, when more complex operations or commands that are connected to a specific order are required, the old code may be insufficient.

The new code defines a behavior corresponding to each command. These behaviors are triggered with the pub.sendMessage command. For example, when the command dracarys is received, behaviour is triggered. This structure provides the user with more interaction with the robot with voice commands, allowing for customizable control, as in the old code.

The new code comes with a model and a list of commands trained according to the user's needs. This provides customizability and adaptation in certain areas. Flexibility is increased since it is also possible to retrain the model thanks to tools such as Teachable Machine.

Also, with this service from Google, you can train models not only with sound but also with images and pose. It only took me 1 hour.

Google Teachable Machine

SentryCoderDev · 2024-11-03T18:51:47Z

SentryCoderDev
Nov 3, 2024
Author

speechinput.py

import speech_recognition as sr
from pubsub import pub
from time import sleep
from threading import Thread
import tensorflow as tf
import numpy as np
import json

class SpeechInput:
    def __init__(self, **kwargs):
        self.recognizer = sr.Recognizer()
        self.recognizer.pause_threshold = 1
        self.device_name = kwargs.get('device_name', 'lp')
        self.device = self.get_device_index(self.device_name)
        self.sample_rate = kwargs.get('sample_rate', 16000)

        self.mic = sr.Microphone(device_index=self.device, sample_rate=self.sample_rate)
        self.listening = False
        self.is_command_mode = False  #

        # Load Teachable Machine model
        self.model = self.load_teachable_model("path/to/model.json", "path/to/weights.bin")
        self.labels = ["ta'anari", "dracarys", "dohaeras", "lykiri", "umbās", "naejot", "māzīs", "rȳbās", "sīr daor", "angōs"]

        pub.subscribe(self.start, 'speech:listen')
        pub.subscribe(self.stop, 'rest')
        pub.subscribe(self.stop, 'sleep')
        pub.subscribe(self.stop, 'exit')

    def __del__(self):
        self.stop()

    def load_teachable_model(self, model_path, weights_path):
        with open(model_path, "r") as f:
            model_json = f.read()
        model = tf.keras.models.model_from_json(model_json)
        model.load_weights(weights_path)
        return model

    def start(self):
        self.listening = True
        Thread(target=self.detect, args=()).start()
        return self

    def get_device_index(self, device_name):
        for index, name in enumerate(sr.Microphone.list_microphone_names()):
            if name == device_name:
                pub.sendMessage('log', msg='[Speech] Mapping mic to index ' + str(index))
                return index

    def detect(self):
        pub.sendMessage('log', msg='[Speech] Initialising Detection')
        with self.mic as source:
            self.recognizer.adjust_for_ambient_noise(source)
            pub.sendMessage('log', msg='[Speech] Detecting...')
            while self.listening:
                try:
                    audio = self.recognizer.listen(source)
                    predicted_label = self.process_teachable_model(audio)

                    if predicted_label:
                        pub.sendMessage('log', msg=f'[Speech] Detected command: {predicted_label}')
                        if not self.is_command_mode:
                            if predicted_label == "ta'anari":
                                self.is_command_mode = True
                                pub.sendMessage('log', msg='[Speech] Command mode activated.')
                        else:
                            self.trigger_behavior(predicted_label)
                            self.is_command_mode = False  
                except sr.WaitTimeoutError as e:
                    pub.sendMessage('log:error', msg='[Speech] Timeout Error: ' + str(e))
                except sr.UnknownValueError:
                    pass

    def process_teachable_model(self, audio_data):
        audio_array = np.array(audio_data.get_wav_data())
        audio_array = audio_array.reshape(1, -1)
        predictions = self.model.predict(audio_array)
        predicted_index = np.argmax(predictions)
        confidence = predictions[0][predicted_index]
        
        if confidence > 0.7:  
            return self.labels[predicted_index]
        return None

    def trigger_behavior(self, command):
        # Komuta göre davranış tetikleme
        commands_behaviors = {
            "dracarys": 'behaviour:fire',
            "dohaeras": 'behaviour:obey',
            "lykiri": 'behaviour:calm',
            "umbās": 'behaviour:wait',
            "naejot": 'behaviour:forward',
            "māzīs": 'behaviour:come',
            "rȳbās": 'behaviour:focus',
            "sīr daor": 'behaviour:not_yet',
            "angōs": 'behaviour:attack'
        }
        
        if command in commands_behaviors:
            pub.sendMessage(commands_behaviors[command])
            pub.sendMessage('log', msg=f'[Speech] Triggered behavior for command: {command}')

    def stop(self):
        self.listening = False
        pub.sendMessage('log', msg='[Speech] Stopping')

0 replies

danic85 · 2024-11-03T20:20:04Z

danic85
Nov 3, 2024
Maintainer

Hello friends, I was bored again and started watching HOTD (House Of The Dragon ), and this came to my mind: what if I adapt the commands there to the robot, because technically my version has a fire spit XD (what could go wrong)

Let me be quite clear here @SentryCoderDev, as this isn't the first time we've had this discussion: high powered lasers and flame-throwing devices are not something I am going to support as part of this project. The latter is illegal without a license, at least in my country, and the former should be also. I don't want them used in connection with the project, and if you're going to do it anyway please don't discuss it in this project's community.

Ignoring that reference, here is my feedback on your code:

Here are some updates I'm thinking of making (there's a sample code in the attachment)

-The old code only processes the text it recognizes using Google's voice recognition API and transmits the user's speech directly as text. This is suitable for simple command recognition and content management, but its flexibility may be limited in advanced command and control scenarios.

The 'old code' is deliberately designed to be a module that can be consumed by custom code. It is generic by design so that the output of the module is accessible to other modules and custom behaviours. It shouldn't be adapted to include more specific, less generic functionality as that is an anti-pattern. See the Single Responsibility Principle and other SOLID programming principles for more information.

-The new code offers a more advanced structure with Google's Teachable Machine model generator. It performs the recognition function using a deep learning model specially trained to recognize specific commands from the user's voice with higher accuracy. This can provide better results in customized command recognition.

This looks like a good use case for an extension that consumes the original module. Since it requires the audio rather than the converted text, there is an opportunity to modify the SpeechInput module to return the audio from a method call that can be consumed by your new module (and others in future). The new 'Teachable Machine' module could then be consumed by your own custom behaviour's code.

Also, this model is most likely triggered by the voice files used in voice training, that is, if I use my own voice in training, it will not be triggered even if the same command comes with someone else's voice. Of course, this accuracy level increases in direct proportion to the number of given voice files, keep that in mind.

Command Mode has been added to the new version, and this mode is activated when specific commands are detected. For example, the command mode is activated by using and perceiving the command Ta'anari(a word that means something to me) and then the corresponding behavior is triggered when another command is detected. This structure allows the user to use a more complex set of commands.

The 'Command Mode' is custom behaviour, as are the specific words, and wouldn't be located in the SpeechInput class or the proposed extension module (see above). This is the kind of thing you'd add to your own custom logic.

In the old code, there is no special command mode, each voice command is processed directly. Therefore, when more complex operations or commands that are connected to a specific order are required, the old code may be insufficient.

The new code defines a behavior corresponding to each command. These behaviors are triggered with the pub.sendMessage command. For example, when the command dracarys is received, behaviour is triggered. This structure provides the user with more interaction with the robot with voice commands, allowing for customizable control, as in the old code.

The new code comes with a model and a list of commands trained according to the user's needs. This provides customizability and adaptation in certain areas. Flexibility is increased since it is also possible to retrain the model thanks to tools such as Teachable Machine.

Also, with this service from Google, you can train models not only with sound but also with images and pose. It only took me 1 hour.

I hope that helps clarify the architecture a little. Any questions please let me know 👍

1 reply

SentryCoderDev Nov 3, 2024
Author

Thank you for being clear and for your comments @danic85

After our last conversation on this subject, I removed the laser and fire thrower. At least the fire thrower shoots a fireball, not as a gas, by burning some kind of paper, but when that paper reacts with O2, it goes out immediately. It creates fire like a momentary light effect, like a match, but it goes out instantly.

I will implement it immediately based on your comments about the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom voice commands with Teachable Machine #99

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Custom voice commands with Teachable Machine #99

SentryCoderDev Nov 3, 2024

Replies: 2 comments · 1 reply

SentryCoderDev Nov 3, 2024 Author

danic85 Nov 3, 2024 Maintainer

SentryCoderDev Nov 3, 2024 Author

SentryCoderDev
Nov 3, 2024

Replies: 2 comments 1 reply

SentryCoderDev
Nov 3, 2024
Author

danic85
Nov 3, 2024
Maintainer

SentryCoderDev Nov 3, 2024
Author