More

jwilliams · 2026-03-21T18:21:23 1774117283

Thanks!

I'd assume that this will be a bit like JSON schemas - the decoding will eventually get smart enough to validate in the output in line with more complex rules.

Agree on the "behind it's back" too. I might make a change that in the case of "--fix" give the LLM the diff on the spot.

The other advantage I've not been able to quantify yet - I've been able to *remove* stuff from CLAUDE.md/etc in favor of lint rules. e.g. prefer ?? over || -- all the way through to "use our logging framework" -- as a lot of the nitpicks in my instructions were bits like this. This keeps the instructions to higher level architectural stuff.

jwilliams · 2026-03-16T10:00:40 1773655240

If you know what you need, my experience is that a well-formed single-prompt that fits the context gives the best results (and fastest).

If you’re exploring an idea or iterating, the roles can help break it down and understand your own requirements. Personally I do that “away” from the code though.

jwilliams · 2026-03-15T17:03:34 1773594214

> Humans and LLMs both share a fundamental limitation. Humans have a working memory, and LLMs have a context limit.

But there’s a more important difference: I can’t spin up 20 decent human programmers from my terminal.

The argument that "code was never the bottleneck" is genuinely appealing, but it hasn’t matched my experience at all. I’m getting through dramatically more work now. This is true for my colleagues too.

My non-technical niece recently built a pretty solid niche app with AI tools. That would have been inconceivable a few years ago.

demorro · 2026-03-15T17:07:43 1773594463

Would you entertain the idea that "work was never the bottleneck", or even "building products was never the bottleneck"?

We need to address Jevons' Paradox somehow.

jwilliams · 2026-03-15T17:32:17 1773595937

I love Jevons’ paradox too, but if we apply it here don’t we still end up with more software?

Definitely would entertain -- I do agree with your framing. I just think the article undersells the impact of fast+cheap codegen.

Lowering the cost of implementation will (has) expose new bottlenecks elsewhere. But imho many of those bottlenecks probably weren’t worth serious investment in solving before. The codegen change will shift that.

demorro · 2026-03-15T17:57:08 1773597428

I think that's where a heck of a lot of the frustration on this topic is coming from. Some engineers claim to have solved the code generation issue well enough that it hasn't been the bottleneck in their local environment, and have been trying to pivot to widening the new bottlenecks for a while now, but have been confounded by organisational dynamics.

To see the other bottlenecks starting to be taken seriously now, but (if I'm to be petulant) all the "credit" of solving the code bottleneck being taken by LLM systems, it's painful, especially when you are in a local domain where the code gen bottleneck doesn't matter very much and hasn't for a long time.

I suspect engineers that managed to solve the code generation bottlenecks are compulsive problem solvers, which exacerbates the issue.

That isn't to say there are some domains where it still does matter, although I'm dubious that LLM codegen is the best solve, but I am not dubious that it is at least a solve.

felipellrocha · 2026-03-15T17:10:05 1773594605

I guess that what people debate on here is what “decent” mean. From my experience, these llms spit out dog shit code, so 20 agents equal 20x more dog shit.

jwilliams · 2026-03-15T01:46:33 1773539193

Atwood has been writing about speculative futures for a long time, so it’s interesting to watch her react in real-time to one of them actually happening.

The post captures something real about LLMs: the interface makes the interaction feel like a social exchange even when you know perfectly well it isn’t. Despite knowing better we attribute intention/emotion/feeling to the LLM. I felt that the most in her (somewhat bleak) sign off at the end.

jwilliams · 2026-03-14T21:18:25 1773523105

I have moved towards super-specific scripts (so I guess "CLI"?) for a few reasons:

1. You can make the script very specific for the skill and permission appropriately.

2. You can have the output of the script make clear to the LLM what to do. Lint fails? "Lint rules have failed. This is an important for reasons blah blah and you should do X before proceeding". Otherwise the Agent is too focused on smashing out the overall task and might opt route around the error. Note you can use this for successful cases too.

3. The output and token usage can be very specific what the agent needs. Saves context. My github comments script really just gives the comments + the necessary metadata, not much else.

The downsides of MCP all focus on (3), but the 1+2 can be really important too.

jwilliams · 2026-03-14T17:25:55 1773509155

I like to do walking meetings or meetings where I'm cleaning/emptying dishwasher/etc. It sounds strange, but I'm a lot more present than when I'm at my computer.

Anyway. Somewhat ironically, I use a wired set of headphones for this. It's not just the speakers that are better. I often get people remarking how much better the audio is on their end too... i.e. the cheap inline microphone.

manmal · 2026-03-14T17:47:42 1773510462

That has probably more to do with the microphone(s) rather than that it's wired. Voice is not a problem at Bluetooth bitrates.

jwilliams · 2026-03-14T18:14:50 1773512090

I suspect it’s mostly microphone position rather than anything else (the headphones I have are the basic Apple ones).

jwilliams · 2026-03-14T16:11:19 1773504679

There are some interesting points here, but I think this essay is a little too choppy - e.g. the Aircraft Mechanic comparison is a long bow to draw.

The Visual Basic comparison is more salient. I've seen multiple rounds of "the end of programmers", including RAD tools, offshoring, various bubble-bursts, and now AI. Just because we've heard it before though, doesn't mean it's not true now. AI really is quite a transformative technology. But I do agree these tools have resulted in us having more software, and thus more software problems to manage.

The Alignment/Drift points are also interesting, but I think that this appeals to SWE's belief that that taste/discernment is stopping this happening in pre-AI times.

I buy into the meta-point which is that the engineering role has shifted. Opening the floodgates on code will just reveal bottlenecks elsewhere (especially as AI's ability in coding is three steps ahead and accelerating). Rebuilding that delivery pipeline is the engineering challenge.

jwilliams · 2026-03-14T15:54:06 1773503646

I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.

So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.

jwilliams · 2026-03-12T01:56:06 1773280566

I think the thing is "MacOS" itself hasn't really been evolved for some time - what has been happening is taking iOS ideas and concepts and porting them back.

I think that's ended up with a bit of a mess.

jwilliams · 2026-03-12T01:53:15 1773280395

I wrote a short bit on a similar topic the other day[^a]. Just because something is faster or even measurably better, that doesn't translate to end productivity.

1. You might be speeding up something that is inherently not productive (the "faster horses" trope). I see companies using AI to generate performance reviews. Same company using AI to summarize all the new performance stuff they're getting. All that's happening is amplified busywork (there is real work in there, but questionable if it's improved).

2. Some things are zero sum. If you're not using AI for marketing you might fall behind. So you adopt these tools, but attention/etc are limited. There is no net gain, just competition.

3. You might speed one part up (typing code), but then other parts of your pipeline quickly become constraints. It might be a long time before we're able to adapt the end-to-end process. This is amplified by coding tools being three strides ahead.

4. Then there are actual productivity improvements. One of these PRs could have been "translate this to German". That could be one PR but a whole step-change for the business.

So much of what is happening falls in buckets 1+2+3. I don't think we've really got into the meat of 4 yet.

a: https://jonathannen.com/ai-productivity/