+1 to logging output. Not too sure what you mean by herald-style message passing, but it sounds like you've implemented subscribe logic from scratch, and each of your agents needs to be aware of domain boundaries and locks?
For most tasks, I agree. One agent with a good harness wins. The case for multiple agents is when the context required to solve the problem exceeds what one agent can hold. This Putnam problem needed more working context than fits in a single window. Decomposing into subgoals lets each agent work with a focused context instead of one agent suffocating on state. Ideally, multi-agent approaches shouldn't add more overall complexity, but there needs to be better tooling for observation etc, as you describe.
I think about this with the analogue of MoE a lot. Essentially, a decision routing process, and similar to having expert submodels, you have a human in the loop or decision sub-tasks when the task requires it.
More specifically, we've been working on a memory/context observability agent. It's currently really good at understanding users and understanding the wide memory space. It could help with the oversight and at least the introspection part.
Yeah I have seen those camps too. I think there will always be a set of problems that have complexity, measured by amount of context required to be kept in working ram, that need more than one agent to achieve a workable or optimal result. I think that single player mode, dev + claude code, you'll come up against these less frequently, but cross-team, cross-codebase bigger complex problems will need more complex agent coordination.
Thanks! That was the goal. We want to let agents be autonomous within their scope, so they can try new paths and fail gracefully. A bad tactic just fails to compile, it can't break anything else.
We use TTL-based claim locks so only one agent works on one goal at a time.
Failed strategies + successful tactics all get written to shared memory, so if a claim expires and a new agent picks it up, it sees everything the previous agent tried.
Ranking is first-verified-wins.
For competing decomposition strategies, we backtrack: if children fail, the goal reopens, and the failed architecture gets recorded so the next attempt avoids it.
When you get a chance to work on your login flow, I recommend giving users an opportunity to request the key rather than automatically showing it once only on the first screen.
I created the account from my phone, and don't have access to the dev tools I'd want to paste the key into. I can deal with it, but I don't know if I'll be able to regenerate the key if I lose it, I'd rather not store it on my phone, and I don't trust my accuracy in manually typing it in on my laptop while looking at my phone, so all the options feel not great. Again, not an actual roadblock, but still something I'd encourage fixing.
Edit added: Good thing I copied the key to my phone before writing this message. Jumping over to this page seems to have forced a refresh/logout on the ensure page in the other tab, so my token would (I think? maybe?) be lost at this point if I'd done it in the other order.
Ahh good call. You absolutely can generate a new key from the dashboard, so if you did lose the one generated during the quickstart, you'd be able to generate another when you log in next and go to the API keys tab.
Will make this more clear in the quickstart, thanks for the feedback
Yeah we're using Ensue since it already handles the annoying infra pieces you’d otherwise have to build to make this work (shared task state + updates, event streams/subscriptions, embeddings + retrieval over intermediate artifacts). You can run the example with a free key from ensue-network.ai. This repo focuses on the orchestration harness.
Math proofs are really easy to run with this specific harness. Our next experiments are going to be bigger, think full code base refactors. We're working on applying RLM to improve context window limits so we can keep more of the actual code in RAM,
Any workloads you want to see? The best are ones that have ways to measure the output being successful, thinking about recreating the C compiler example Anthropic did, but doing it for less than the $20k in tokens they used.
Maybe I'm just not working on complex or big enough projects but I haven't encountered a case of a feature that couldn't be implemented in one or two context windows. Or using vanilla Claude Code a multi-phase plan doc with a couple of sub agents and a final verification pass with Codex.
I guess maybe I'm doing the orchestration manually, but I always find there's tons of decisions that need to be made in the middle of large plan implementations.
Your refactor example terrifies me because the best part of a refactor is cleaning out all the bandaid workarounds and obsolete business logic you didn't even know existed. Can't see how an agent swarm would be able to figure that out unless you provide a giga-spec file containing all current business knowledge. And if you don't spec it the agents will just eagerly bake these inefficiencies and problems into your migrated app.
reply