On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them
This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.
This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.
As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...
This assumes AI model improvements will be predictable, which they won’t.
There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.
Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.
I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.
The real improvements are going to be in tooling and harnessing.
> The real improvements are going to be in tooling and harnessing
I don't have any special knowledge here, but the guy in the podcast (who worked/works with one of the big AI firms) is the one who made the claim. In the future when (if?) the speed of development slows I agree it would no longer be true
It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.
As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.
I read once (so no idea if it is true) that in voice lessons, one of the most effective things you can do to improve people's technique is to tell them to pretend to be an opera singer.
I think you took away the wrong lesson from that podcast:
I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.
Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.
[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...