But, to implement something like this from scratch would take a good while.
Also, here are some Automatic Speech Recognition toolkits (which won't run offline on a microcontroller) out there. These are useful to pipe the data into a program that deals with intents (something like [RASA](https://rasa.com)
(Require Internet)
* [Deepgram](https://deepgram.com) - I believe they build upon OpenAI's Whisper model and have their own custom models too
* Google Cloud / Microsoft Azure / AWS / IBM Watson
Picovoice here solves both the issues of hotword/wake-word detection and intent extraction. This looks like something you could build on top of ARM's [keyword spotting program](https://github.com/ARM-software/ML-KWS-for-MCU) and the wake word services listed in [Rhasspy's docs](https://rhasspy.readthedocs.io/en/latest/wake-word/#raven)
But, to implement something like this from scratch would take a good while.
Also, here are some Automatic Speech Recognition toolkits (which won't run offline on a microcontroller) out there. These are useful to pipe the data into a program that deals with intents (something like [RASA](https://rasa.com)
(Require Internet) * [Deepgram](https://deepgram.com) - I believe they build upon OpenAI's Whisper model and have their own custom models too * Google Cloud / Microsoft Azure / AWS / IBM Watson
(Can be run Offline) * [OpenAI's Whisper](https://github.com/openai/whisper) * [nVidia's NEMO](https://github.com/NVIDIA/NeMo) * [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)
When you see how complicated the space is and how many ways you can actually shoot yourself in the foot. This post starts to look a wee bit better.