I'll echo what's been said. I use Opus for long running coding tasks - mainly for its stability with long context. I can knock out huge chunks of a project in a single conversation without any attention degradation.
Sonnet's reasoning is very solid and that's what I use at work when I need many API calls to reason on variations of things. Ie, numerical trial results, experiment outcomes, etc. Independent queries, Opus pricing would be overkill, context small enough that Sonnet knocks it out.
I think the same is true for code. I'd use Sonnet for hammering out unit tests, API wrappers, etc.
I only use parallel agents when I have simple tasks to get done on all projects. Cleanup, running evals, etc. That whole day becomes parallel agents on easy tasks. Otherwise I do tight feedback loops on single agent (with subagents).
When running multi-agent I recommend keeping an eye on the flow of the work. Is it touching files that make sense? Has it been spiraling too long, is it pulling in packages? Things I can eyeball quickly without fully committing to that context. A full review of that sus diff would make me context switch, but seeing it's editing only files in a related part of the codebase is low effort.
I've recently been tasked with hiring new college grads (less than 2 years of experience) AI Engineers. Some things that we've been doing:
1) Do they understand the ecosystem of algorithms and models? How do those coexist? Statistical models are the right choice sometimes. Sometimes it's XGBoosted RFs, sometimes NNs, LLMs, etc. And they're not mutually exclusive. I don't think that has changed since AI to be honest - though I get bad candidates that say LLM-everything of course.
2) AI-assisted fluency. Not just in coding, but in concept build. I don't expect them to have the velocity of an AI-fluent principal engineer, but I want to see that they're not resistant to AI-assistance. This is obviously new.
3) Experience with production systems, more than before. Pre-AI, I'd accept that a recent grad might be tuned towards models and algorithms, and wouldn't know much about frontend or backend, or anything you run into in production environments. Given the ease with which you can now setup a small DB, your modeling pipeline, and a full react dashboard or fastAPI frontend...I'd at least like to see they've dabbled in all of that, have a rough sense of it. I don't need them to be full-stack, or even comfortable with it - but AI has raised the breadth bar for me.
Can you expand on your approach of defining things around state transitions? Are you thinking of it as the state of a task (built, validated, integrated, etc)? Or something else entirely? I'm not sure I followed that part and it seemed rather key.
Anyone expecting a higher tier subscription to be announced since this current reduction?
Cynicism aside - I do wonder what the future will hold given that current token burn rates aren't sustainable without VC cash. Anthropic even pushed us to use haiku for claude code for "many" tasks in our enterprise training, so I'm wondering if it's not a company need of sorts to reduce the burn?
I'm glad to see that it stands its ground more than other models - which is a genuinely useful trait for an assistant. Both on technical and emotional topics.
Are there any natrual ways of swapping from clock time to agent "active time"? For some agents that are running intermittently I might want to keep those memories longer (in clock time).
Sonnet's reasoning is very solid and that's what I use at work when I need many API calls to reason on variations of things. Ie, numerical trial results, experiment outcomes, etc. Independent queries, Opus pricing would be overkill, context small enough that Sonnet knocks it out.
I think the same is true for code. I'd use Sonnet for hammering out unit tests, API wrappers, etc.
reply