Hacker Newsnew | past | comments | ask | show | jobs | submit | alexarena's commentslogin

This is very cool. It seems like the prompt is asking the LLM to one shot an answer. Have you tried asking it to make a group, confirm whether it's correct, and repeat with the remaining words? (like a human would)


Curious how you’re implementing the background queue for publishing? Something custom in Rust?


It uses Google Cloud Tasks


interval.com founder here, this is correct! Everything w/ us is defined in our SDK.

So instead of writing React code or using a drag-and-drop builder, you define everything ranging from simple forms to more complex views in Node.js or Python code using our SDK.


Thanks! On the differences between Retool: the output (customer support tools, admin dashboards, etc.) is pretty similar between both products, but _how_ those tools are built is really different.

Something like Retool gives you a drag-and-drop UI builder, Interval is made for backend devs and lets you create UIs directly in your backend code. So you don’t need to learn another drag-and-drop tool or frontend framework.

Re: where the code actually runs… this is another really cool component of Interval. We host the UI for you on interval.com but the actual backend code (including everything sensitive like your environment variables, business logic, etc.) runs on your infra and Interval can’t see it by design.


Founder of https://interval.com here. We're somewhere in-between Retool and Windmill which was mentioned on this thread.

Like Windmill, Interval is heavily code-focused. Our model lets you define tools in your existing TypeScipt/JavaScript codebase.

Like Retool, you can use Interval to build complete internal dashboards that handle the "view stuff" side of things, not just the script/workflow "do stuff" pieces.


This is cool! Curious if it would be possible to run the model on device?


Probably not realistic. On an M1 Pro MBP, Whisper runs far slower than real time. Think on the order of days for a 2 hour recording.

I’ve been doing transcription work for public meetings. Whisper is truly incredible in terms of error rate even in extremely challenging circumstances (obscure acronyms, unusual terms, unusual names, poor recording quality). I was seeing only a few errors per hour; most things that look like errors are in fact accurate representation of humans saying weird things. But I have to run it on my desktop with CUDA enabled. With the medium model it is iirc barely faster than real time. I only have a 1070 so maybe it is better with more modern hardware.

Whisper does also have some slightly strange behavior with silence and very long recordings. I might do a blog post once I’ve got more experience.


On M1 Pro, with Greedy decoder and medium model, I can transcribe 1 hour audio in just 10 minutes (~x6 real-time) [0].

[0] https://github.com/ggerganov/whisper.cpp


I just transcribed a 32 minute audio recording of someone doing a speech that someone recorded using their phone mic.

I used default settings of "import audio file" with the Buzz application, and it was transcribed in less than 10 minutes. 24KB text file or so.

I'm on a windows PC with AMD ryzen 3


There were at least two errors in the video demo, and that was just 15 seconds of audio. “I can take some notes from a meeting” was transcribed as “I can take some notes from meeting”, and “I click stop [recording]” ended up as “And click the stop”.


Me too - I recently ported the model to plain C/C++ and I am now planning to run it on device and see if the performance is any good. Will post an update when/if it works out


For PCs there is the buzz application https://github.com/chidiwilliams/buzz/tree/main.


I suppose the built-in iOS voice recognition would be better for that.

I haven't really compared those two properly. Wonder how much better Whisper is.


Apple will keep up with anything that SOTA, just with a bit of a lag - so just expect they will be better soon if not already

Word of warning from someone who built an SDK that filled in a processing gap that Apple had (6DOF Monocular SLAM)[1] Apple will eventually make your technology obsolete and their version will be way better. See: ARKit

We open sourced it once ARKit came out because there was no way to monetize it further

[1] https://github.com/Pair3D/PairSDK


Whisper is a game changer in terms of accuracy. It makes Zoom, YouTube, Zoom, Office/Azure, Descript, and Otter.ai transcription look like jokes in comparison.

The step change in transcription accuracy here is significant enough to cross an important threshold for usefulness.


Honestly kind of impressed that the HegartyMaths guy independently found this and then handled it without (explicitly) threatening to sue you.


The jump-straight-to-suing approach is to be honest a bit specific to the US. In the UK (like here) it’s more usual to deal with these sorts of things with a kind word, combined with hints of potential problems later.


They were champs! We even connected on LinkedIn with Colin afterwards and he actually offered us summer work but that fell through unfortunately.


He didn't pay you for the consulting time you have him?


Money is vain. They got much more out of it and the post very clearly states that.


Congrats! Cool to see a new class of JS runtimes springing up. Lots to be excited about here, but cold start time seems like a game changer for building at the edge.


Thanks! We just switched to Docusaurus, would absolutely recommend. We started w/ a separate marketing site and docs site, but since our team are basically all engineers, we just decided to use Docusaurus for everything.


Their availability sharing is top notch, too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: