Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is definitely way too much plumbing and going back and forth.

But one thing that MUST get better soon is having the AI agent verify its own code. There are a few solutions in place, e.g. using an MCP server to give access to the browser, but these tend to be brittle and slow. And for some reason, the AI agents do not like calling these tools too much, so you kinda have to force them every time.

PRD review can be done, but AI cannot fill the missing gaps the same way a human can. Usually, when I create a new PRD, it is because I have a certain vision in my head. For that reason, the process of reviewing the PRD can be optimized by maybe 20%. OR maybe I struggle to see how tools could make me faster at reading and commenting / editing the PRD.



Agents __SHOULD NOT__ verify their own code. They know they wrote it, and they act biased. You should have a separate agent with instructions to red team the hell out of a commit, be strict, but not nitpick/bikeshed, and you should actually run multiple review agents with slightly different areas of focus since if you try to run one agent for everything it'll miss lots of stuff. A panel of security, performance, business correctness and architecture/elegance agents (armed with a good covering set of code context + the diff) will harden a PR very quickly.


Codex uses this principle - /review runs in a subthread, does not see previous context, only git diff. This is what I am using. Or I open Cursor to review code written by GPT-5 using Sonnet.


Do you have examples of this working, or any best practices on how to orchestrate it efficiently? It sounds like the right thing to do, but it doesn't seem like the tech is quite to the point where this could work in practice yet, unless I missed it. I imagine multiple agents would churn through too many tokens and have a hard time coming to a consensus.


I've been doing this with Gemini 2.5 for about 6 months now. It works quite well, it doesn't catch big architectural 100% but it's very good at line/module level logic issues and anti-patterns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: