Just claims with nothing to back it. Steal people's work of years, and turn around be like I make it "so much better". Support this compiler for 20 years then
What I missed when trying it was a simple way of accessing private repositories. There does not seem to be ssh agent forwarding, or is there? What do people use?
I realize this is all very fresh, but still wondering…
It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.
One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.
>> This doesn’t seem to exist in this case, which may simply be the deal breaker.
Perhaps, but perhaps not. The reason tests are valuable in these scenarios is they are actually a kind of system spec. LLMs can look at them to figure out how a system should (and should not) behave, and use that to guide the implementation.
I don’t see why regular specs (e.g. markdown files) could not serve the same purpose. Of course, most GitHub projects don’t include such files, but maybe that will change as time goes on.
There is one feature in Claude Code which is often overlooked and I haven't seen it in any of the other agentic tools: There is a tool called "sub-agent", which creates a fresh context windows in which the model can independently work on a clearly defined sub-task. This effectively turns Claude Code from a single-agent model to a hierarchical multi-agent model (I am not sure if the hierarchy goes to depths >2).
I wonder if it is a concious decision not to include this (I imagine it opens a lot of possibilities of going crazy, but it also seems to be the source of a great amount of Claud Code's power). I would very much like to play with this if it appears in gemini-cli
Next step would be the possibility to define custom prompts, toolsets and contexts for specific re-occuring tasks, and these appearing as tools to the main agent. Example for such a thing: create_new_page. The prompt could describe the steps one needs to create the page. Then the main agent could simply delegate this as a well-defined task, without cluttering its own context with the operational details.
Possibly. One could think about hooking this in as a tool or simple shell command. But then there is no management when multiple tools modify the codebase simultaneously.
But it is still worth a try and may be possible with some prompting and duct tape.
One thing I'd really like to see in coding agents is this: As an architect, I want to formally define module boundaries in my software, in order to have AI agents adhere to and profit from my modular architecture.
Even with 1M context, for large projects, it makes sense to define boundaries These will typically be present in some form, but they are not available precisely to the coding agent. Imagine there was a simple YAML format where I could specify modules and where they can be found in the source tree, and the APIs of other modules it interacts with. Then it would be trivial to turn this into a context that would very often fit into 1M tokens. When an agent decides something needs to be done in the context of a specific module, it could then create a new context window containing exactly that module, effetively turning a large codebase into a small codebase, for which Gemini is extraordinarily effective.
I would be interested in reading what tools are made available to the LLM, and how everything is wired together to form an effective analysis loop. It seems like this is a key ingredient here.
- All prompts used
- The structure of the agent team (which agents / which roles)
- Any other material that went into the process
This would be a good source for learning, even though I'm not ready to spend 20k$ just for replicating the experiment.
reply