More

lxe · 2026-02-17T05:05:58 1771304758

I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.

wolvoleo · 2026-02-17T09:28:10 1771320490

Yeah local works really fine. I tried this other tool: https://github.com/KoljaB/RealtimeVoiceChat which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it's interrupting me.

Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.

Wowfunhappy · 2026-02-17T06:31:26 1771309886

> The "local is too slow" argument doesn't hold up anymore if you have any GPU at all.

By "any GPU" you mean a physical, dedicated GPU card, right?

That's not a small requirement, especially on Macs.

arach · 2026-02-17T06:47:37 1771310857

My M1 16GB Mini and M2 16GB Air both deliver insane local transcription performance without eating up much memory - I think the M line + Parakeet delivers insane local performance and you get privacy for free

ghrl · 2026-02-17T08:41:09 1771317669

Yeah, that model is amazing. It even runs reasonably well on my mid-range Android phone with this quite simple but very useful application, as long as you don't speak for too long or interrupt yourself for transcribing every once in a while. I do have handy.computer on my Mac too.

https://news.ycombinator.com/item?id=46640855

I find the model works surprisingly well and in my opinion surpasses all other models I've tried. Finally a model that can mostly understand my not-so-perfect English and handle language switching mid sentence (compare that to Gemini's voice input, which is literally THE WORST, always trying to transcribe in the wrong language and even if the language is correct produces the uttermost crap imaginable).

arach · 2026-02-17T15:42:11 1771342931

Ack for dictations but Gemini voice is fun for interactive voice experiments -> https://hud.arach.dev/ honestly blown away by how much Gemini could assist with with basically no dev effort

grosswait · 2026-02-17T12:42:00 1771332120

No. Give it a try I think you’ll be surprised

h3lp · 2026-02-17T19:56:16 1771358176

FWIW whisper.cpp with the default model works at 6x realtime transcription speed on my four-core ~2.4GHz laptop, and doesn't really stress CPU or memory. This is for batch transcribing podcasts.

The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.

BatteryMountain · 2026-02-17T20:06:30 1771358790

I also built one.. mine is called whispy. I use mine to pump commands to claude. So far a bit hit & miss, still tweaking it.

nitroedge · 2026-02-18T05:15:15 1771391715

Handy for me has worked wonders

wazoox · 2026-02-17T10:02:20 1771322540

I've installed murmure on my 2013 Mac, and it works through 1073 words/minute. I don't know about you, but that's plenty faster than me :D

lxe · 2026-01-27T21:02:01 1769547721

Thanks for surfacing this. If you click to "tools" button to the left of "compile", you'll see a list of comments, and you can resolve them from there. We'll keep improving and fixing things that might be rough around the edges.

EDIT: Fixed :)

radioactivist · 2026-01-28T00:09:15 1769558955

Thanks! (very quickly too)

lxe · 2026-01-16T02:06:43 1768529203

Eh. This is yet another "I tried AI to do a thing, and it didn't do it the way I wanted it, therefore I'm convinced that's just how it is... here's a blog about it" article.

"Claude tries to write React, and fails"... how many times? what's the rate of failure? What have you tried to guide it to perform better.

These articles are similar to HN 15 years ago when people wrote "Node.JS is slow and bad"

lxe · 2026-01-08T19:47:45 1767901665

I hate this type of headline.

Imagine if we had something like:

    "google downloads and executes malware"
    "outlook downloads and executes malware"
    "chrome downloads and executes malware"

That would be ridiculous, right? The right headline is:

    "a person using a computer downloads and executes malware"

lxe · 2025-12-27T00:17:15 1766794635

It's crazy how the etymology of "Kentucky" cannot be traced with certainty. Goes to show how much of the native American culture and language is now untraceable and how fragile our record-keeping is, even in "modern times".

beasthacker · 2025-12-27T01:18:16 1766798296

The etymology I’ve heard isn’t even listed in the article. One theory traces “Kentucky” to early forms like Cantucky or Cane-tucky, referring to the region’s vast river-cane brakes, Kentucky River cane, North America’s only native bamboo, which early inhabitants associated with fertile, game-rich land.

sparky_z · 2025-12-27T19:39:51 1766864391

In that theory, where does the "tucky" come from?

lxe · 2025-12-11T05:27:22 1765430842

I always wondered how something like AWS or GCP Cloud Console admin UIs get shipped. How could someone deliver a product like these and be satisfied, rewarded, promoted, etc. How can Google leadership look at this stuff and be like... "yup, people love this".

jiggawatts · 2025-12-11T05:33:18 1765431198

A purer, more perfect example of Conway’s Law has never been made more manifest than the myriad AWS consoles, each further partitioned by region.

https://en.wikipedia.org/wiki/Conway%27s_law

And see especially “The Only Unbreakable Rule” by Molly Rocket: https://youtu.be/5IUj1EZwpJY

Aperocky · 2025-12-11T05:50:32 1765432232

In defense of AWS consoles, they are derivative of AWS APIs, as such they are really just a convenience layer that will only occasionally string 2+ AWS APIs together for convenience purposes that can be considered distinct feature on the console.

That is wholly unlike the problem here where the console and API somehow behaves completely differently.

ksimukka · 2025-12-11T08:32:20 1765441940

Along with the public APIs, An AWS service can also have Console APIs that are specifically for the console. These APIs do not have the same constraints as the public api.

(My team built the MediaLive service)

theflyinghorse · 2025-12-11T05:47:47 1765432067

What are the chances that Google leadership even seen GCP interface outside of a demo once a never?

DANmode · 2025-12-11T05:55:46 1765432546

Google doesn’t have leadership, it has shareholders.

lxe · 2025-11-29T21:03:21 1764450201

This is nuts and I absolutely love this. So you convert the PDF into image, edit the image, then convert the image back into a PDF.

thenthenthen · 2025-11-30T01:34:27 1764466467

This is the usual workflow dealing with pdfs (unfortunately)

esafak · 2025-11-30T22:06:02 1764540362

No, it's not, unless you are dealing with scans. Lots of apps let you edit PDFs.

lxe · 2025-11-17T20:41:00 1763412060

Object.defineProperty on every request to set params / query / body is probably slower than regular property assignment.

Also parsing the body on every request without the ability to change it could hurt performance (if you're going for performance that is as a primary factor).

I wonder if the trie-based routing is actually faster than Elysia in precompile mode set to enabled?

Overall, this is a nice wrapper on top of bun.serve, structured really well. Code is easy to read and understand. All the necessary little things taken care of.

The dev experience of maintaining this is probably a better selling point than performance.

lxe · 2025-11-14T21:55:45 1763157345

In my opinion, attempting to perform live dictation is a solution that is looking for a problem. For example, the way I'm writing this comment is: I hold down a keyboard shortcut on my keyboard, and then I just say stuff. And I can say a really long thing. I don't need to see what it's typing out. I don't need to stream the speech-to-text transcription. When the full thing is ingested, I can then release my keys, and within a second it's going to just paste the entire thing into this comment box. And also, technical terms are going to be just fine with Whisper. For example, Here's a JSON file.

(this was transcribed using whisper.cpp with no edits. took less than a second on a 5090)

whamp · 2025-11-15T16:32:43 1763224363

Yea whisper has more features and is awesome if you have the hardware to run the big models that are accurate enough. The constraint here is the best cpu only implementation. By no means am I wedded or affiliated with parakeet, it's just the best/fastest within the CPU hardware space.

atonse · 2025-11-14T22:44:55 1763160295

I’ve been using Parakeet with MacWhisper for a lot of my AI coding interactions. It’s not perfect but generally saves me a lot of time.

lxe · 2025-11-15T00:50:07 1763167807

I barely use a keyboard for most things anymore.

lxe · 2025-11-14T19:52:25 1763149945

I've done something similar for Linux and Mac. I originally used Whisper and then switched to Parakeet. I much prefer whisper after playing with both. Maybe I'm not configuring Parakeet correctly, But the transcription that comes out of Whisper is usually pretty much spot on. It automatically removes all the "ooms" and all the "ahs" and it's just way more natural, in my opinion. I'm using Whisper.CPP with CUDA acceleration. This whole comment is just written with me dictating to a whisper, and it's probably going to automatically add quotes correctly, there's going to be no ums, there's going to be no ahs, and everything's just going to be great.

clueless · 2025-11-14T20:41:31 1763152891

Mind sharing your local setup for Mac?

hasperdi · 2025-11-15T20:45:12 1763239512

If you don't mind closed source paid app, I can recommend MacWhisper. You can select different models of Whisper & Parakeet for dictation and transcription. My favorite feature is that it allows sending the transcription output to an LLM for clean-up, or anything you want basically eg. professional polish, translate, write poems etc.

I have enough RAM on my Mac that I can run smaller LLMs locally. So for me the whole thing stays local

lxe · 2025-11-14T21:52:55 1763157175

https://github.com/lxe/yapyap/tree/parakeet-nemo

It's been a while, so I don't know if it's going to work because of the Nemo toolkit ASR numpy dependency issues.

I use it for Linux using whisper CPP and it works great