As I automate more and more of my agentic coding process, I've come to realize that a swipe-based UX is very likely to dominate corporate decision making in the years to come.
The posted link is a research report on the topic - in full disclosure generated by a custom research agent I've been working on.
I'm sure that others are working on other novel UXs for fast decision making to coordinate their agents. I'd love to hear any insights you've gained so far.
I've worked in governance for the last 15 years. Based on that experience, nobody truly cares about the UX to signify a decision. They care about the communication of the information to make the decision in the first place. So you might be right, but you are focusing on something that is fairly irrelevant. If you want to innovate in the boardroom, innovate on information flow.
I've been building something in this space ("Clink" - multi-agent coordination layer) and this research confirms some of the assumptions that motivated the project. You can't just throw more agents at a problem and expect it to get better.
The error amplification numbers are wild! 17x for independent agents vs 4x with some central coordination. Clink provides users (and more importantly their agents) the primitives to choose their own pattern.
The most relevant features are...
- work queues with claim/release for parallelizable tasks
- checkpoint dependencies when things need to be sequential
- consensus voting as a gate before anything critical happens
The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together. It feels like we're all still learning what works here. The docs are at https://docs.clink.voxos.ai if anyone wants to poke around!
> The part about tool count increasing coordination overhead is interesting too. I've been considering exposing just a single tool to address this, but I wonder how this plays out as people start stacking more MCP servers together.
It works really well. Whatever knowledge LLMs absorb about CLI commands seems to transfer to MCP use so a single tool with commands/subcommands works very well. It’s the pattern I default to when I’m forced to use an MCP server instead of providing a CLI tool (like when the MCP server needs to be in-memory with the host process).
I've started with the basics for now: messages (called "Clinks" because... marketing), groups, projects, milestones - which are all fairly non-novel and one might say this is just Slack/Jira. The ones that distinguish it are proposals to facilitate distributed consensus behaviour between agents. That's paired with a human-in-the-loop type proposal that requires the fleet owner to respond to the proposal via email.
That's great to hear. It makes sense given the MCP server in this case is mainly just a proxy for API calls. One thing I wonder is at what point do you decide your single tool description packs in too much context? Do you introduce a tool for each category of subcommands?
Wouldn't it be better just to stack functionalities of multiple agents into a single agent instead of getting this multi-agent overhead/failure? Many people in academia consider multi-agentic systems to be just an artifact of the current crop of LLMs but with longer and longer reliable context and more reliable calls of larger numbers of tools in recent models multi-agentic systems seem less and less necessary.
In some cases, you might actually want to cleanly separate parallel agents' context, no? I suppose you could make your main agent with stack functionalities responsible for limiting the prompt of any subagents it spawns.
My hunch is that we'll see a number of workflows that will benefit from this type of distributed system. Namely, ones that involve agents having to collaborate across timezones and interact with humans from different departments at large organizations.
Coordination of workflows between people using different LLM providers is the big one. You prefer Anthropic's models, your coworker swears by OpenAI's. None of these companies are going to support frameworks/tools that allow agent swarms to use anything other than their own models.
Work hours is the only way I've learned to think about it productively.
It's also important to gather consensus among the team and understand if/why work hour estimates differ between individuals on the same body of work or tasks. I'd go so far as to say that a majority of project planning, scoping, and derisking can be figured out during an honest discussion about work hour estimates.
Story points are too open to interpretation and have no meaningful grounding besides the latent work hours that need to go into them.
If you have complex tasks and you have more than one person put in time to do a proper estimate, yes, you should sync up and see if you have different opinions or unclear issues.
It's moreso that a backend developer can now throw together a frontend and vice-versa without relying on a team member or needing to set aside time to internalize all the necessary concepts to just make that other part of the system work. I imagine even a full-stack developer will find benefits.
Copilot is going to feel "amazing" at helping you quickly work within just about any subject that you're not already an expert in.
Whether or not a general purpose foundation model for coding is trained on more backend or frontend code is largely irrelevant in this specific context.
As someone who hasn't had to own a car in over 8 years (lived in NYC) and recently bought a 2023 Hyundai Santa Fe with birdseye view parking it shocks me how uncalibrated my car-prioception is.
It's made me realize that objects are much further from the boundaries of my car when backing into a spot parallel parking. I would never think to get so close to another car if I had to only rely on my own senses.
With that said, I realize there's a significant number of people that are even poorer estimators of these distances than myself. I.e. those that won't pass through two cars even though to me it's obvious that they could easily pass.
I have to imagine a big part of this has to do with risk assessment and lack of risk-free practice opportunity IRL. Nobody is seeing how far they can push or train themselves in this regard when the consequences are to scratch up your car and others' cars. With the birdseye view I can actually do that now!
We do not allow the strategies to keep growing there is a refinement phase where we refine and merge existing strategies. The experiments were run with this config - https://github.com/codelion/optillm/blob/main/optillm/plugin... which allows a maximum of 10 strategies of each type.
Ingesting idea! I've been looking for an alternative to using Android's Tasks app for jotting down thoughts. I prefer it over the Notes app because I can curate categories as different lists.
Random callout: the copy in your app store preview images would benefit from some proof reading. Example: "WE dont just store thoughts, but makes sense of them" should likely be "ThoughtCatcher doesn't just store thoughts, it makes sense of them". My 2 cents is to also rework "Capture your mind" as it's a little awkward. Maybe "Organize your thoughts", "Supercharge your thoughts", or something along those lines.
Thanks so much for the thoughtful feedback — really appreciate you taking the time to point that out!
You're absolutely right — the copy needs some polish, and that line in particular slipped through. I'm already working on updates to clean up the messaging and make it more clear and engaging (and less awkward — "Capture your mind" was definitely a placeholder).
Thanks again — feedback like this is super valuable as I shape ThoughtCatcher into something truly useful!
The posted link is a research report on the topic - in full disclosure generated by a custom research agent I've been working on.
I'm sure that others are working on other novel UXs for fast decision making to coordinate their agents. I'd love to hear any insights you've gained so far.
reply