One thing that radicalized me was building an agent that tested network connecti...

0xbadcafebee · 2025-11-07T02:55:30 1762484130

> I'm not saying that the agent would do a better job than a good "hardcoded" human telemetry system, and we don't use agents for this stuff right now.

And that's why I won't touch 'em. All the agents will be abandoned when people realize their inherent flaws (security, reliability, truthfulness, etc) are not worth the constant low-grade uncertainty.

In a way it fits our times. Our leaders don't find truth to be a very useful notion. So we build systems that hallucinate and act unpredictably, and then invest all our money and infrastructure in them. Humans are weird.

simonw · 2025-11-07T04:58:08 1762491488

Some of us have been happily using agentic coding tools (Claude Code etc) since February and we're still not abandoning them for their inherent flaws.

crystal_revenge · 2025-11-07T05:34:07 1762493647

The problem with statements like these is that I work with people who make the same claims, but are slowly building useless, buggy monstrosities that for various reasons nobody can/will call out.

Obviously I’m reasonably willing to believe that you are an exception. However every person I’ve interacted with who makes this same claim has presented me with a dumpster fire and expected me to marvel at it.

edanm · 2025-11-07T18:25:47 1762539947

But isn't this true of all technologies? I know plenty of people who are amazing Python developers. I've also seen people make a huge mess, turning a three-week project into a half-year mess because of their incredible lack of understanding of the tools they were using (Django, fittingly enough for this conversation).

That there's a learning curve, especially with a new technology, and that only the people at the forefront of using that technology are getting results with it - that's just a very common pattern. As the technology improves and material about it improves - it becomes more useful to everyone.

simonw · 2025-11-07T05:40:18 1762494018

I'm not going to dispute your own experience with people who aren't using this stuff effectively, but the great thing about the internet is that you can use it to track the people who are making the very best use of any piece of technology.

crystal_revenge · 2025-11-07T06:04:54 1762495494

This line of reasoning is smelling pretty "no true Scotsman" to me. I'm sure there were amazing ColdFusion devs, but that hardly justifies the use of the technology. Likewise "This tool works great on the condition that you need to hire a Simon Willison level dev" is almost a fault. I'm pretty confident you could squeeze some juice out of a Markov Chain (ignoring, of course, that decoder-only LLMs are basically fancy MCs).

In a weird way it sort of reminds me of Common Lisp. When I was younger I thought it was the most beautiful language and a shame that it wasn't more widely adopted. After a few decades in the field I've realized it's probably for the best since the average dev would only use it to create elaborate foot guns.

gartdavis · 2025-11-07T08:51:06 1762505466

"elaborate foot guns" -- HN is a high signal environment, but I could read for a week and not find a gem like this. Props.

Destiny visits me on my 18th birthday and says, "Gart, your mediocrity will result in a long series of elaborate foot guns. Be humble. You are warned."

notpachet · 2025-11-07T13:52:21 1762523541

> I've realized it's probably for the best since the average dev would only use it to create elaborate foot guns

see also: react hooks

hombre_fatal · 2025-11-07T14:15:36 1762524936

Meh, smart high-agency people can write good software, and they can go on to leverage powerful tools in productive ways.

All I see in your post is equivalent to something like: you're surrounded by boot camp coders who write the worst garbage you've ever seen, so now you have doubts for anyone who claims they've written some good shit. Psh, yeah right, you mean a mudball like everyone else?

In that scenario there isn't much a skilled software engineer with different experiences can interject because you've already made your decision, and your decision is based on experiences more visceral than anything they can add.

I do sympathize that you've grown impatient with the tools and the output of those around you instead of cracking that nut.

cyberpunk · 2025-11-07T05:44:31 1762494271

We have gpt-5 and gemini 2.5 pro at work, and both of them produce huge amounts of basically shit code that doesn’t work.

Every time i reach for them recently I end up spending more time refactoring the bad code out or in deep hostage negotiations with the chatbot of the day that I would have been faster writing it myself.

That and for some reason they occasionally make me really angry.

Oh a bunch of prompts in and then it hallucinated some library a dependency isn’t even using and spews a 200 line diff at me, again, great.

Although at least i can swear at them and get them to write me little apology poems..

Etheryte · 2025-11-07T06:50:55 1762498255

On the sometimes getting angry part, I feel you. I don't even understand why it happens, but it's always a weird moment when I notice it. I know I'm talking to a machine and it can't learn from its mistakes, but it's still very frustrating to get back yet another here's the actual no bullshit fix, for real this time, pinky promise.

simonw · 2025-11-07T05:56:23 1762494983

Are you using them via a coding agent harness such as Codex CLI or Gemini CLI?

cyberpunk · 2025-11-07T06:27:19 1762496839

Via the jetbrains plugin, has an 'agent' mode and can edit files and call tools so on, yes I setup MCP integrations and so on also. Still kinda sucks. shrug.

I keep flipping between this is the end of our careers, to I'm totally safe. So far this is the longest 'totally safe' period I've had since GPT-2 or so came along..

techpression · 2025-11-07T06:41:54 1762497714

I abandoned Claude Code pretty quickly, I find generic tools give generic answers, but since I do Elixir I’m ”blessed” with Tidewave which gives a much better experience. I hope more people get to experience framework built tooling instead of just generic stuff.

It still wants to build an airplane to go out with the trash sometimes and will happily tell you wrong is right. However I much prefer it trying to figure it out by reading logs, schemas and do browser analysis automatically than me feeding logs etc manually.

DeathArrow · 2025-11-07T07:50:13 1762501813

Cursor can read logs and schemas and use curl to test API responses. It can also look into the database.

techpression · 2025-11-07T08:48:17 1762505297

But then you have to use Cursor. Tidewave runs as a dependency in the framework and you just navigate to a url, it’s quite refreshing actually.

foobarian · 2025-11-06T22:38:02 1762468682

Honestly the top AI use case for me right now is personal throwaway dev tools. Where I used to write shell oneliners with dozen pipes including greps and seds and jq and other stuff, now I get an AI to write me a node script and throw in a nice Web UI to boot.

Edit: reflecting on what the lesson is here, in either case I suppose we're avoiding the pain of dealing with Unix CLI tools :-D

jacquesm · 2025-11-06T22:45:34 1762469134

Interesting. You have to wonder if all the tools that is based on would have been written in the first place if that kind of thing had been possible all along. Who needs 'grep' when you can write a prompt?

tptacek · 2025-11-06T22:46:44 1762469204

My long running joke is that the actual good `jq` is just the LLM interface that generates `jq` queries; 'simonw actually went and built that.

dannyobrien · 2025-11-06T23:23:10 1762471390

https://github.com/simonw/llm-jq for those following along at home

https://github.com/simonw/llm-cmd is what i use as the "actually good ffmpeg etc front end"

and just to toot my own horn, I hand Simon's `llm` command lone tool access to its own todo list and read/write access to the cwd with my own tools, https://github.com/dannyob/llm-tools-todo and https://github.com/dannyob/llm-tools-patch

Even with just these and no shell access it can get a lot done, because these tools encode the fundamental tricks of Claude Code ( I have `llmw` aliased to `llm --tool Patch --tool Todo --cl 0` so it will have access to these tools and can act in a loop, as Simon defines an agent. )

a-french-anon · 2025-11-07T09:36:00 1762508160

Tried gron (https://github.com/tomnomnom/gron) a bit? If you know your UNIX, I think it can replace jq in a lot of cases. And when it can't, well, you can reach for Python, I guess.

agumonkey · 2025-11-07T00:33:20 1762475600

It's highly plausible that all we assumed was good design / engineering will disappear if LLMs/Agents can produce more without having the be modular. (sadly)

jacquesm · 2025-11-07T01:23:03 1762478583

There is some kind of parallel behind 'AI' and 'Fuzzy Logic'. Fuzzy logic to me always appeared like a large number of patches to get enough coverage for a system to work even if you didn't understand it. AI just increases the number of patches to billions.

agumonkey · 2025-11-07T03:29:43 1762486183

true, there's often a point where your system becomes a blurry miracle

andai · 2025-11-07T03:43:16 1762486996

Could you give some examples? I'm having the AI write the shell scripts, wondering if I'm missing out on some comfy UIs...

foobarian · 2025-11-07T03:49:16 1762487356

I was debugging a service that was spitting out a particular log line. I gave Copilot an example line, told it to write a script that tails the log line and serves a UI via port 8080 with a table of those log lines parsed and printed nicely. Then I iterated by adding filter buttons, aggregation stats, simple things like that. I asked it to add a "clear" button to reset the UI. I probably would not even have done this without an AI because the CLI equivalent would be parsing out and aggregating via some form of uniq -c | sort -n with a bunch of other tuning and it would be too much trouble.

sumedh · 2025-11-07T05:50:36 1762494636

It can be anything. It depends on what you want to do with the output.

You can have a simple dashboard site which collects the data from our shell scripts and shows your a summary or red/green signals so that you can focus on things which are interested in.

chickensong · 2025-11-07T01:58:33 1762480713

I hadn't given much thought to building agents, but the article and this comment are inspiring, thx. It's interesting to consider agents as a new kind of interface/function/broker within a system.

zahlman · 2025-11-06T22:50:03 1762469403

> They know all the flags and are generally better at interpreting tool output than I am.

In the toy example, you explicitly restrict the agent to supply just a `host`, and hard-code the rest of the command. Is the idea that you'd instead give a `description` something like "invoke the UNIX `ping` command", and a parameter described as constituting all the arguments to `ping`?

tptacek · 2025-11-06T22:53:11 1762469591

Honestly, I didn't think very hard about how to make `ping` do something interesting here, and in serious code I'd give it all the `ping` options (and also run it in a Fly Machine or Sprite where I don't have to bother checking to make sure none of those options gives code exec). It's possible the post would have been better had I done that; it might have come up with an even better test.

I was telling a friend online that they should bang out an agent today, and the example I gave her was `ps`; like, I think if you gave a local agent every `ps` flag, it could tell you super interesting things about usage on your machine pretty quickly.

indigodaddy · 2025-11-07T15:56:32 1762530992

Or have the agent strace a process and describe what's going on as if you're a 5 year old (because I actually need that to understand strace output)

tptacek · 2025-11-07T15:57:44 1762531064

Iterated strace runs are also interesting because they generate large amounts of data, which means you actually have to do context programming.

mwcampbell · 2025-11-07T00:23:15 1762474995

What is Sprite in this context?

cess11 · 2025-11-07T08:52:46 1762505566

I'm guessing the Fly Machine they're referring to is a container running on fly.io, perhaps the sprite is what the Spritely Institute calls a goblin.

zahlman · 2025-11-06T23:10:22 1762470622

Also to be clear: are the schemas for the JSON data sent and parsed here specific to the model used? Or is there a standard? (Is that the P in MCP?)

spenczar5 · 2025-11-07T02:17:23 1762481843

Its JSON schema, well standardized, and predates LLMs: https://json-schema.org/

zahlman · 2025-11-07T05:05:02 1762491902

Ah, so I can specify how I want it to describe the tool request? And it's been trained to just accommodate that?

simonw · 2025-11-07T05:38:44 1762493924

Most LLMs have tool patterns trained into them now, which are then managed for you by the API that the developers run on top of the models.

But... you don't have to use that at all. You can use pure prompting with ANY good LLM to get your own custom version of tool calling:

  Any time you want to run a calculation, reply with:
  {{CALCULATOR: 3 + 5 + 6}}
  Then STOP. I will reply with the result.

Before LLMs had tool calling we called this the ReAct pattern - I wrote up an example of implementing that in March 2023 here: https://til.simonwillison.net/llms/python-react-pattern