I wanted to build my own speech-to-text transcription program [1] for Discord, similar to how zoom or google hangouts works. I built it so that I can record my group's DND sessions and build applications / tools for VTTs (Virtual TableTop gaming).
It can process a set of 3-hour audio files in ~20 mins.
It can process a set of 3-hour audio files in ~20 mins.
I recorded a demo video of how it works here: https://www.youtube.com/watch?v=v0KZGyJARts&t=300s
[1] https://github.com/naveedn/audio-transcriber
I alluded to building this tool on a previous HN thread: https://news.ycombinator.com/item?id=45338694