Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing that radicalized me was building an agent that tested network connectivity for our fleet. Early on, in like 2021, I deployed a little mini-fleet of off-network DNS probes on, like, Vultr to check on our DNS routing, and actually devising metrics for them and making the data that stuff generated legible/operationalizable was annoying and error prone. But you can give basic Unix network tools --- ping, dig, traceroute --- to an agent and ask it for a clean, usable signal, and they'll do a reasonable job! They know all the flags and are generally better at interpreting tool output than I am.

I'm not saying that the agent would do a better job than a good "hardcoded" human telemetry system, and we don't use agents for this stuff right now. But I do know that getting an agent across the 90% threshold of utility for a problem like this is much, much easier than building the good telemetry system is.



> I'm not saying that the agent would do a better job than a good "hardcoded" human telemetry system, and we don't use agents for this stuff right now.

And that's why I won't touch 'em. All the agents will be abandoned when people realize their inherent flaws (security, reliability, truthfulness, etc) are not worth the constant low-grade uncertainty.

In a way it fits our times. Our leaders don't find truth to be a very useful notion. So we build systems that hallucinate and act unpredictably, and then invest all our money and infrastructure in them. Humans are weird.


Some of us have been happily using agentic coding tools (Claude Code etc) since February and we're still not abandoning them for their inherent flaws.


The problem with statements like these is that I work with people who make the same claims, but are slowly building useless, buggy monstrosities that for various reasons nobody can/will call out.

Obviously I’m reasonably willing to believe that you are an exception. However every person I’ve interacted with who makes this same claim has presented me with a dumpster fire and expected me to marvel at it.


But isn't this true of all technologies? I know plenty of people who are amazing Python developers. I've also seen people make a huge mess, turning a three-week project into a half-year mess because of their incredible lack of understanding of the tools they were using (Django, fittingly enough for this conversation).

That there's a learning curve, especially with a new technology, and that only the people at the forefront of using that technology are getting results with it - that's just a very common pattern. As the technology improves and material about it improves - it becomes more useful to everyone.


I'm not going to dispute your own experience with people who aren't using this stuff effectively, but the great thing about the internet is that you can use it to track the people who are making the very best use of any piece of technology.


This line of reasoning is smelling pretty "no true Scotsman" to me. I'm sure there were amazing ColdFusion devs, but that hardly justifies the use of the technology. Likewise "This tool works great on the condition that you need to hire a Simon Willison level dev" is almost a fault. I'm pretty confident you could squeeze some juice out of a Markov Chain (ignoring, of course, that decoder-only LLMs are basically fancy MCs).

In a weird way it sort of reminds me of Common Lisp. When I was younger I thought it was the most beautiful language and a shame that it wasn't more widely adopted. After a few decades in the field I've realized it's probably for the best since the average dev would only use it to create elaborate foot guns.


"elaborate foot guns" -- HN is a high signal environment, but I could read for a week and not find a gem like this. Props.

Destiny visits me on my 18th birthday and says, "Gart, your mediocrity will result in a long series of elaborate foot guns. Be humble. You are warned."


> I've realized it's probably for the best since the average dev would only use it to create elaborate foot guns

see also: react hooks


Meh, smart high-agency people can write good software, and they can go on to leverage powerful tools in productive ways.

All I see in your post is equivalent to something like: you're surrounded by boot camp coders who write the worst garbage you've ever seen, so now you have doubts for anyone who claims they've written some good shit. Psh, yeah right, you mean a mudball like everyone else?

In that scenario there isn't much a skilled software engineer with different experiences can interject because you've already made your decision, and your decision is based on experiences more visceral than anything they can add.

I do sympathize that you've grown impatient with the tools and the output of those around you instead of cracking that nut.


We have gpt-5 and gemini 2.5 pro at work, and both of them produce huge amounts of basically shit code that doesn’t work.

Every time i reach for them recently I end up spending more time refactoring the bad code out or in deep hostage negotiations with the chatbot of the day that I would have been faster writing it myself.

That and for some reason they occasionally make me really angry.

Oh a bunch of prompts in and then it hallucinated some library a dependency isn’t even using and spews a 200 line diff at me, again, great.

Although at least i can swear at them and get them to write me little apology poems..


On the sometimes getting angry part, I feel you. I don't even understand why it happens, but it's always a weird moment when I notice it. I know I'm talking to a machine and it can't learn from its mistakes, but it's still very frustrating to get back yet another here's the actual no bullshit fix, for real this time, pinky promise.


Are you using them via a coding agent harness such as Codex CLI or Gemini CLI?


Via the jetbrains plugin, has an 'agent' mode and can edit files and call tools so on, yes I setup MCP integrations and so on also. Still kinda sucks. shrug.

I keep flipping between this is the end of our careers, to I'm totally safe. So far this is the longest 'totally safe' period I've had since GPT-2 or so came along..


I abandoned Claude Code pretty quickly, I find generic tools give generic answers, but since I do Elixir I’m ”blessed” with Tidewave which gives a much better experience. I hope more people get to experience framework built tooling instead of just generic stuff.

It still wants to build an airplane to go out with the trash sometimes and will happily tell you wrong is right. However I much prefer it trying to figure it out by reading logs, schemas and do browser analysis automatically than me feeding logs etc manually.


Cursor can read logs and schemas and use curl to test API responses. It can also look into the database.


But then you have to use Cursor. Tidewave runs as a dependency in the framework and you just navigate to a url, it’s quite refreshing actually.


Honestly the top AI use case for me right now is personal throwaway dev tools. Where I used to write shell oneliners with dozen pipes including greps and seds and jq and other stuff, now I get an AI to write me a node script and throw in a nice Web UI to boot.

Edit: reflecting on what the lesson is here, in either case I suppose we're avoiding the pain of dealing with Unix CLI tools :-D


Interesting. You have to wonder if all the tools that is based on would have been written in the first place if that kind of thing had been possible all along. Who needs 'grep' when you can write a prompt?


My long running joke is that the actual good `jq` is just the LLM interface that generates `jq` queries; 'simonw actually went and built that.


https://github.com/simonw/llm-jq for those following along at home

https://github.com/simonw/llm-cmd is what i use as the "actually good ffmpeg etc front end"

and just to toot my own horn, I hand Simon's `llm` command lone tool access to its own todo list and read/write access to the cwd with my own tools, https://github.com/dannyob/llm-tools-todo and https://github.com/dannyob/llm-tools-patch

Even with just these and no shell access it can get a lot done, because these tools encode the fundamental tricks of Claude Code ( I have `llmw` aliased to `llm --tool Patch --tool Todo --cl 0` so it will have access to these tools and can act in a loop, as Simon defines an agent. )


Tried gron (https://github.com/tomnomnom/gron) a bit? If you know your UNIX, I think it can replace jq in a lot of cases. And when it can't, well, you can reach for Python, I guess.


It's highly plausible that all we assumed was good design / engineering will disappear if LLMs/Agents can produce more without having the be modular. (sadly)


There is some kind of parallel behind 'AI' and 'Fuzzy Logic'. Fuzzy logic to me always appeared like a large number of patches to get enough coverage for a system to work even if you didn't understand it. AI just increases the number of patches to billions.


true, there's often a point where your system becomes a blurry miracle


Could you give some examples? I'm having the AI write the shell scripts, wondering if I'm missing out on some comfy UIs...


I was debugging a service that was spitting out a particular log line. I gave Copilot an example line, told it to write a script that tails the log line and serves a UI via port 8080 with a table of those log lines parsed and printed nicely. Then I iterated by adding filter buttons, aggregation stats, simple things like that. I asked it to add a "clear" button to reset the UI. I probably would not even have done this without an AI because the CLI equivalent would be parsing out and aggregating via some form of uniq -c | sort -n with a bunch of other tuning and it would be too much trouble.


It can be anything. It depends on what you want to do with the output.

You can have a simple dashboard site which collects the data from our shell scripts and shows your a summary or red/green signals so that you can focus on things which are interested in.


I hadn't given much thought to building agents, but the article and this comment are inspiring, thx. It's interesting to consider agents as a new kind of interface/function/broker within a system.


> They know all the flags and are generally better at interpreting tool output than I am.

In the toy example, you explicitly restrict the agent to supply just a `host`, and hard-code the rest of the command. Is the idea that you'd instead give a `description` something like "invoke the UNIX `ping` command", and a parameter described as constituting all the arguments to `ping`?


Honestly, I didn't think very hard about how to make `ping` do something interesting here, and in serious code I'd give it all the `ping` options (and also run it in a Fly Machine or Sprite where I don't have to bother checking to make sure none of those options gives code exec). It's possible the post would have been better had I done that; it might have come up with an even better test.

I was telling a friend online that they should bang out an agent today, and the example I gave her was `ps`; like, I think if you gave a local agent every `ps` flag, it could tell you super interesting things about usage on your machine pretty quickly.


Or have the agent strace a process and describe what's going on as if you're a 5 year old (because I actually need that to understand strace output)


Iterated strace runs are also interesting because they generate large amounts of data, which means you actually have to do context programming.


What is Sprite in this context?


I'm guessing the Fly Machine they're referring to is a container running on fly.io, perhaps the sprite is what the Spritely Institute calls a goblin.


Also to be clear: are the schemas for the JSON data sent and parsed here specific to the model used? Or is there a standard? (Is that the P in MCP?)


Its JSON schema, well standardized, and predates LLMs: https://json-schema.org/


Ah, so I can specify how I want it to describe the tool request? And it's been trained to just accommodate that?


Most LLMs have tool patterns trained into them now, which are then managed for you by the API that the developers run on top of the models.

But... you don't have to use that at all. You can use pure prompting with ANY good LLM to get your own custom version of tool calling:

  Any time you want to run a calculation, reply with:
  {{CALCULATOR: 3 + 5 + 6}}
  Then STOP. I will reply with the result.
Before LLMs had tool calling we called this the ReAct pattern - I wrote up an example of implementing that in March 2023 here: https://til.simonwillison.net/llms/python-react-pattern




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: