Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd like others' input on this: increasingly, I see Cursor, Jetbrains, etc. moving towards a model of having you manage many agents working on different tasks simultaneously. But in real, production codebases, I've found that even a single agent is faster at generating code than I am at evaluating its fitness and providing design guidance. Adding more agents working on different things would not speed anything up. But perhaps I am just much slower or a poorer multi-tasker than most. Do others find these features more useful?




I usually run one agent at a time in an interactive, pair-programming way. Occasionally (like once a week) I have some task where it makes sense to have one agent run for a long time. Then I'll create a separate jj workspace (equivalent of git worktree) and let it run.

I would probably never run a second agent unless I expected the task to take at least two hours, any more than that and the cost of multitasking for my brain is greater than any benefit, even when there are things that I could theoretically run in parallel, like several hypotheses for fixing a bug.

IIRC Thorsten Ball (Writing an Interpreter in Go, lead engineer on Amp) also said something similar in a podcast – he's a single-tasker, despite some of his coworkers preferring fleets of agents.


Same.

I've recently described how I vibe-coded a tool to run this single background agent in a docker container in a jj workspace[0] while I work with my foreground agent but... my reviewing throughput is usually saturated by a single agent already, and I barely ever run the second one.

New tools keep coming up for running fleets of agents, and I see no reason to switch from my single-threaded Claude Code.

What I would like to see instead, are efforts on making the reviewing step faster. The Amp folks had an interesting preview article on this recently[1]. This is the direction I want tools to be exploring if they want to win me over - help me solve the review bottleneck.

[0]: https://news.ycombinator.com/item?id=45970668

[1]: https://ampcode.com/news/review


I really would like an answer to this.

My CTO is currently working on the ability to run several dockerised versions of the codebase in parallel for this kind of flow.

I’m here wondering how anyone could work on several tasks at once at a speed where they can read, review and iterate the output of one LLM in the time it takes for another LLM to spit an answer for a different task.

Like, are we just asking things as fast as possible and hoping for a good solution unchecked? Are others able to context switch on every prompt without a reduction in quality? Why are people tackling the problem of prompting at scale as if the bottleneck was token output rather than human reading and reasoning?

If this was a random vibecoding influencer I’d get it, but I see professionals trying this workflow and it makes me wonder what I’m missing.


I was going to say that this is how genetic algorithms work, but there is still too much human in the loop.

Maybe code husbandry?


Code Husbandry is a good term for what I've been thinking about how to implement. I hope you don't mind if I steal it. Think automated "mini agents", each with a defined set of tools and tasks, responding to specific triggers.

Imagine one agent just does docstrings - on commit, build an AST, branch, write/update comments accordingly, push and create a merge request with a standard report template.

Each of these mini-agents has a defined scope and operates in its own environment, and can be customized/trained as such. They just run continuously on the codebase based on their rules and triggers.

The idea is that all these changes bubble up to the developer for approval, just maybe after a few rounds of LLM iteration. The hope is that small models can be leveraged to a higher quality of output and operate in an asynchronous manner.


My assumption lately is that this workflow is literally just “it works, so merge”. Running multiple in parallel does not allownfor inspection of the code just for testing functional requirements at the end

Hmm, I haven’t managed to make it work yet, and I’ve tried. The best I can manage is three completely separate projects, and they all get only divided attention (which is often good enough these days).

Do you feel you get a faster/better end result than focusing on a single task at a time?

I can’t help but feel it’s like texting and driving, where people are overvaluing their ability to function with reduced focus. But obviously I have zero data to back that up.


Rather than having multiple agents running inside of one IDE window, I structure my codebase in a way that is somewhat siloed to facilitate development by multiple agents. This is an obvious and common pattern when you have a front-end and a back-end. Super easy to just open up those directories of the repository in separate environments and have them work in their own siloed space.

Then I take it a step further and create core libraries that are structured like standalone packages and are architected like third-party libraries with their own documentation and public API, which gives clear boundaries of responsibility.

Then the only somewhat manual step you have is to copy/paste the agent's notes of the changes that they made so that dependent systems can integrate them.

I find this to be way more sustainable than spawning multiple agents on a single codebase and then having to rectify merge conflicts between them as each task is completed; it's not unlike traditional software development where a branch that needs review contains some general functionality that would be beneficial to another branch and then you're left either cherry-picking a commit, sharing it between PRs, or lumping your PRs together.

Depending on the project I might have 6-10 IDE sessions. Each agent has its own history then and anything to do with running test harnesses or CLI interactions gets managed on that instance as well.


Even with the best agent in plan mode, there can be communication problems, style mismatches, untested code, incorrect assumptions and code that is not DRY.

I prefer to use a single agent without pauses and catch errors in real time.

Multiple agent people must be using pauses, switching between agents and checking every result.


I think this is the UX challenge of this era. How to design a piece of software that aids in promoting the human-level of attention to a distributed state without causing information loss or cognitive decline over many tasks. I agree that for any larger piece of work with significant scope the overhead of ingesting the context into your brain offsets the time saving costs you get from multitask promises.

My take on this is that the better these things get eventually we will be able to infer and quantify signals that provide high confidence scores for us to conduct a better review that requires a shorter decision path. This is akin to how compilers, parsers, linters, can give you some level of safety without strong guarantees but are often "good enough" to pass a smell test.


No... I've found the opposite where using the fastest model to do the smallest pieces is useful and anything where I have to wait 2m for a wrong answer is just on the way.

There's pretty much no way anyone context switching that fast is paying a lick of attention. They may be having fun, like scrolling tiktok or playing a videogame just piling on stimuli, but I don't believe they're getting anything done. It's plausible they're smarter than me, it is not plausible they have a totally different kind of brain chemistry.


The parallel agent model is better for when you know the high level task you want to accomplish but the coding might take a long time. You can split it up in your head “we need to add this api to the api spec” “we need to add this thing to the controller layer” etc. and then you use parallel agents to edit just the specific files you’re working on.

So instead of interactively making one agent do a large task you make small agents do the coding while you focus on the design.


My context window is small. It's hard enough keeping track of one timeline, I just don't see the appeal in running multiple agents. I can't really keep up.

For some things its helpful, like have one agent plan changes / get exact file paths, another agent implement changes, another agent review the PR, etc. The context window being small is the point I think. Chaining agents lets you break up the work, and also give different agents different toolsets so they aren't all taking a ton of MCPs / Claude Skills into context at once.

Right. A computer can make more code than a human can review. So, forget about the universe where you ever review code. You have to shift to almost a QA person and ignore all code and just validate the output. When it is suggested that you as a programmer will disappear, this is what they mean.

>You have to shift to almost a QA person and ignore all code and just validate the output.

The obvious answer to this is that it is not feasible to retry each past validation for each new change, which is why we have testing in the first place. Then you’re back at square one because your test writing ability limits your output.

Unless you plan on also vivecoding the tests and treating the whole job as a black box, in which case we might as well just head for the bunkers.


"... treating the whole job as a black box"

Yes, that is exactly what I mean. You ask the Wizard of Oz for something, and you hear some sounds behind the curtain, and you get something back. Validate that, and if necessary, ask Oz to try again.

"The obvious answer to this is that it is not feasible to retry each past validation for each new change"

It is reasonably feasible because the job of Production Development and QA has existed, developers just sat in the middle. Now we remove the developer, and move them over to the role of combined Product + QA, and all Product + QA was ever able to even validate was developer output (which, as far as they were ever concerned, was an actual black box since they don't know how to program).

The developer disappears when they are made to disappear or decide to disappear. If the developer begins articulating ideas in language like a product developer, and then validates like a QA engineer, then the developer has "decided" to disappear. Other developers will be told to disappear.

The existential threat to the developer is not when the company mandate comes down that you are to be a "Prompt Engineer" now, it is when the mandate comes down that you need to be a Product Designer now (as in, you mandated not to write a single. line. of. code.) . In which case vast swaths of developers will not cut it on a pure talent level.


You haven’t addressed the original question. The point is not whether the QA understands the codebase, but whether the QA understands its own test system.

If yes, the QA is manuallish (considering manual == no automate by AI) and we’re still bottlenecked, so speeding up the engineer was a loss for nothing.

If no, because QA is also AI, then you have a product with no humans eyes on it being tested by another system with no human eyes of it. So effectively nobody knows what it does.

If you think LLMs are anywhere near that level of trust I don’t know what you’re smoking. They’re still doing things like “fixing” tests by removing relevant non passing cases every day.


I think for production code this is wildly irresponsible. I’m having a decent time with LLM code generation, but I wouldn’t dream of skipping code review.

I'm with you. The industry has pivoted from building tools that help you code to selling the fantasy that you won't have to. They don't care about the reality of the review bottleneck; they care about shipping features that look like 'the future' to sell more seats.

I have to agree, currently it doesn't look that innovative. I would rather want parallel agents working on the same task, orchestrated in some way to get the best result possible. Perhaps using IntelliJ for code insights, validation, refactoring, debugging, etc.

Completely agree. The review burden and context switching I need to do from even having two running at once is too much, and using one is already pretty good (except when it’s not).

I think the problem is that current AI models are slow to generate tpkens so the obvious solution is 'parallelism'. If they could poop out pages of code instantly, nobody would think about parallel agents.

I wish we'll get a model that's not necessarily intelligent, but at least competent at following instructions and is very fast.

I overwhelmingly prefer the workflow where I have an idea for a change and the AI implements it (or pushes back, or does it in an unexpected way) - that way I still have a general idea of what's going on with the code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: