Custom voice commands with Teachable Machine #99
Replies: 2 comments 1 reply
-
|
Beta Was this translation helpful? Give feedback.
-
Let me be quite clear here @SentryCoderDev, as this isn't the first time we've had this discussion: high powered lasers and flame-throwing devices are not something I am going to support as part of this project. The latter is illegal without a license, at least in my country, and the former should be also. I don't want them used in connection with the project, and if you're going to do it anyway please don't discuss it in this project's community. Ignoring that reference, here is my feedback on your code:
The 'old code' is deliberately designed to be a module that can be consumed by custom code. It is generic by design so that the output of the module is accessible to other modules and custom behaviours. It shouldn't be adapted to include more specific, less generic functionality as that is an anti-pattern. See the Single Responsibility Principle and other SOLID programming principles for more information.
This looks like a good use case for an extension that consumes the original module. Since it requires the audio rather than the converted text, there is an opportunity to modify the SpeechInput module to return the audio from a method call that can be consumed by your new module (and others in future). The new 'Teachable Machine' module could then be consumed by your own custom behaviour's code.
The 'Command Mode' is custom behaviour, as are the specific words, and wouldn't be located in the SpeechInput class or the proposed extension module (see above). This is the kind of thing you'd add to your own custom logic.
I hope that helps clarify the architecture a little. Any questions please let me know 👍 |
Beta Was this translation helpful? Give feedback.
-
Hello friends, I was bored again and started watching
HOTD
(House Of The Dragon
), and this came to my mind: what if I adapt the commands there to the robot, because technically my version has a fire spit XD (what could go wrong)Here are some updates I'm thinking of making (there's a sample code in the attachment)
-The old code only processes the text it recognizes using Google's voice recognition API and transmits the user's speech directly as text. This is suitable for simple command recognition and content management, but its flexibility may be limited in advanced command and control scenarios.
-The new code offers a more advanced structure with Google's
Teachable Machine
model generator. It performs the recognition function using a deep learning model specially trained to recognize specific commands from the user's voice with higher accuracy. This can provide better results in customized command recognition.Also, this model is most likely triggered by the voice files used in voice training, that is, if I use my own voice in training, it will not be triggered even if the same command comes with someone else's voice. Of course, this accuracy level increases in direct proportion to the number of given voice files, keep that in mind.
Command Mode
has been added to the new version, and this mode is activated when specific commands are detected. For example, the command mode is activated by using and perceiving the commandTa'anari
(a word that means something to me) and then the corresponding behavior is triggered when another command is detected. This structure allows the user to use a more complex set of commands.In the old code, there is no special command mode, each voice command is processed directly. Therefore, when more complex operations or commands that are connected to a specific order are required, the old code may be insufficient.
The new code defines a behavior corresponding to each command. These behaviors are triggered with the pub.sendMessage command. For example, when the command
dracarys
is received,behaviour
is triggered. This structure provides the user with more interaction with the robot with voice commands, allowing for customizable control, as in the old code.The new code comes with a model and a list of commands trained according to the user's needs. This provides customizability and adaptation in certain areas. Flexibility is increased since it is also possible to retrain the model thanks to tools such as Teachable Machine.
Also, with this service from Google, you can train models not only with sound but also with images and pose. It only took me 1 hour.
Google Teachable Machine
Beta Was this translation helpful? Give feedback.
All reactions