Hacker Newsnew | past | comments | ask | show | jobs | submit | d4rkp4ttern's commentslogin

Interesting that it doesn’t specify what type of oatmeal, e.g steel cut vs quick oats etc. I thought steel cut is more beneficial.

Rolled oats according to the paper

Thanks , I found it after clicking through to the actual nature paper, where it’s a detail buried deep down in the paper. They really should have mentioned it up front.

An interesting shift I’ve seen over the past few weeks, is we’re starting to refer to bare LLMs themselves as “agents”.

Used to be that agent = LLM + scaffold/harness/loop/whatever.


I think some of the distinction here is that the more recent "bare LLMs" have been more purpose built, augmented with "agent" specific RL, and in general more fine tuned for the requirements of "agents". Things such as specific reasoning capabilities, tool calling, etc.

These all make the "bare LLMs" better suited to be used within the "agent" harness.

I think the more accurate term would be "agentic LLMs" instead of calling them "agents" outright. As to why its the case now, probably just human laziness and colloquialisms.


Yes, the post training is the special sauce.

GPT 5.2 in a simple while loop runs circles around most things right now. It was released barely a month ago and many developers have been on vacation/hibernating/etc. during this time.

I give it 3-4 more weeks before we start to hear about the death of agentic frameworks. Pointing GPT5+ at a powershell or C#/Python REPL is looking way more capable than wiring up a bunch of domain-specific tools. A code-based REPL is the ultimate tool. You only need one and you can force the model to always call it (100% chance of picking the right tool). The amount of integration work around Process.Start is approximately 10-15 minutes, even if you don't use AI assistance.


Yes this “REPL/CLI is all you need” realization is exactly what’s behind the wild success of Claude Code and derivative CLI coding agents.

My definition of agent has always been an LLM with "effectful" tools, run in a loop where the LLM gets to decide when the task is complete. In other words, an LLM with "agency".

This is exactly how I think of it. An agent has three elements: intelligence (LLM), autonomy (loop) and tools to do anything interesting/useful.

I almost thought it was MalBot, which would have been more apt.

Parakeet V3 is near-instant transcription, and the slight accuracy drop relative to the slower/bigger Whisper models is immaterial when talking to AIs that can “read between the lines”.

This is not strictly speech-to-speech, but I quite like it when working with Claude Code or other CLI Agents:

STT: Handy [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track.

TTS: Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a non-blocking stop hook that calls a headless agent to create the 1/2-sentence summary. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe etc.

The voice plugin gives commands to control it:

    /voice:speak stop
    /voice:speak azelma (change the voice)
    /voice:speak <your arbitrary prompt to control the style or other aspects>
[1] Handy https://github.com/cjpais/Handy

[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts

[3] Voice plugin for Claude Code: https://github.com/pchalasani/claude-code-tools?tab=readme-o...



Nice, I’ll have to try it out. They should really make a uv-installable CLI tool like pocket-TTS did. People underestimate just how much more immediately usable something becomes when you can simply get something by doing “uv tool install …”

True that. People, especially developers, underestimate the importance of packaging. Or, in general, making it easier for others to use your product.

Wow Handy works impressively well! Excellent UX too (on Windows at least).

Hi, so I'm looking for an stt that can happen on a server/cron, that will use a small local model (I have 4 vCPU threadripper CPU only and 20G ram on the server) and be able to transcribe from remote audio URLs (preferably, but I know that local models probably don't have this feature so will have to do something like curl the audio down to memory or /tmp and then transcribe and then remove the file etc).

Have any thoughts?


I’ve no thoughts on that unfortunately.


posts like this are why i visit HN daily!!!

thanks for sharing your knowledge; can’t wait to try out your voice plugin


Same!

Feel free to file a gh issue if you have problems with the voice plugin


As others said this was possible for months already with llama-cop’s support for Anthropic messages API. You just need to set the ANTHROPIC_BASE_URL. The specific llama-server settings/flags were a pain to figure out and required some hunting, so I collected them in this guide to using CC with local models:

https://github.com/pchalasani/claude-code-tools/blob/main/do...

One tricky thing that took me a whole day to figure out is that using Claude Code in this setup was causing total network failures due to telemetry pings, so I had to set this env var to 1: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC


Curious what llama-server flags you used. On my M1 Max 64GB MacBook I tried it in Claude Code (which has a 25K system message) and I get 3 tps.

But with Qwen3-30B-A3B I get 20 tps in CC.


Curious how it compares to last week’s release of Kyutai’s Pocket-TTS [1] which is just 100M params, and excellent in both speed and quality (English only). I use it in my voice plugin [2] for quick voice updates in Claude Code.

[1] https://github.com/kyutai-labs/pocket-tts

[2] https://github.com/pchalasani/claude-code-tools?tab=readme-o...


First time I’m hearing about a shortcut for this. I always use 2 hyphens. Is that not considered an em-dash ?

You are absolutely right — most internet users don't know the specific keyboard combination to make an em dash and substitute it with two hyphens. On some websites it is automatically converted into an em dash. If you would like to know more about this important punctuation symbol and it's significance in identitifying ai writing, please let me know.

Wow thanks for the enlightenment. I dug into this a bit and found out:

Hyphen (-) — the one on your keyboard. For compound words like “well-known.”

En dash (–) — medium length, for ranges like 2020–2024. Mac: Option + hyphen. Windows: Alt + 0150.

Em dash (—) — the long one, for breaks in thought. Mac: Option + Shift + hyphen. Windows: Alt + 0151.

And now I also understand why having plenty of actual em-dashes (not double hyphens) is an “AI tell”.


If you have the compose key enabled it's trivial to write all sorts of things. Em dash is compose (right alt for me) ---

En dash is compose --.

You can type other fun things like section symbol (compose So) and fractions like ⅐ with compose 17, degree symbol (compose oo) etc.

https://itsfoss.com/compose-key-gnome-linux/

On phones you merely long press hyphen to get the longer dash options.


Thanks for that. I had no idea either. I'm genuinely surprised Windows buries such a crucial thing like this. Or why they even bothered adding it in the first place when it's so complicated.

The Windows version is an escape hatch for keying in any arbitrary character code, hence why it's so convoluted. You need to know which code you're after.

To be fair, the alt-input is a generalized system for inputting Unicode characters outside the set keyboard layout. So it's not like they added this input specifically. Still, the em dash really should have an easier input method given how crucial a symbol it is.

It's a generalized system for entering code page glyphs that was extended to support Unicode. 0150 and 0151 only work if you are on CP1252 as those aren't the Unicode code points.

And Em Dash is trivially easy on iOS — you simply hold press on the regular dash button - I’ve been using it for years and am not stopping because people might suddenly accuse me of being an AI.

Thanks for delving into this key insight!

No it's not the same. Note there are medium and long as well.

That said I always use -- myself. I don't think about pressing some keyboard combo to emphasise a point.


The long --- if you're that way minded --- is just 3 hyphens :)

Yep I realize this now, as I said in my other comment.

Context filling up is sort of the Achilles heel of CLI agents. The main remedy is to have it output some type of handoff document and then run /compact which leaves you with a summary of the latest task. It sort of works but by definition it loses information, and you often find yourself having to re-explain or re-generate details to continue the work.

I made a tool[1] that lets you just start a new session and injects the original session file path, so you can extract any arbitrary details of prior work from there using sub-agents.

[1] aichat tool https://github.com/pchalasani/claude-code-tools?tab=readme-o...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: