With how hard something like this is, we should be looking at the positive side....

With how hard something like this is, we should be looking at the positive side.

Picovoice here solves both the issues of hotword/wake-word detection and intent extraction. This looks like something you could build on top of ARM's [keyword spotting program](https://github.com/ARM-software/ML-KWS-for-MCU) and the wake word services listed in [Rhasspy's docs](https://rhasspy.readthedocs.io/en/latest/wake-word/#raven)

But, to implement something like this from scratch would take a good while.

Also, here are some Automatic Speech Recognition toolkits (which won't run offline on a microcontroller) out there. These are useful to pipe the data into a program that deals with intents (something like [RASA](https://rasa.com)

(Require Internet) * [Deepgram](https://deepgram.com) - I believe they build upon OpenAI's Whisper model and have their own custom models too * Google Cloud / Microsoft Azure / AWS / IBM Watson

(Can be run Offline) * [OpenAI's Whisper](https://github.com/openai/whisper) * [nVidia's NEMO](https://github.com/NVIDIA/NeMo) * [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)

When you see how complicated the space is and how many ways you can actually shoot yourself in the foot. This post starts to look a wee bit better.