This cost seems wild. For comparison GitHub Copilot Code Review is four cents per review once you're outside of the credits included with your subscription.
Not sure about that, I find it's generally pretty good at finding niche issues that aren't as easily caught by humans. Especially with newer LLMs it gets even better.
Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.
Unless my understanding is incorrect about how these tools work that last point isn't really a quality of LLMs as such? It gets attributed because the lines are blurred but the tireless trial and error is actually just a quality of a regular programatic loop (agent/orchestrator) that happens to be doing the trickiest part of its work via an LLM.
AFAICT this is already baked into the GitHub Copilot agent. I read its sessions pretty often and reviewing/testing after writing code is a standard part of its workflow almost every time. It's kind of wild seeing how diligent it is even with the most trivial of changes.
GH Copilot is definitely far better than just a linter. I don't have examples to hand but one thing that's stood out to me is its use of context outside the changes in the diff. It'll pull in context that typically isn't visible in the PR itself, the sort of things that only someone experienced in the code base with good recall would connect the dots on (e.g. this doesn't conform to typical patterns, or a version of this is already encapsulated in reusable code, or there's an existing constant that could be used here instead of the hardcoded value you have).
I've also noticed this explosion of code review tools and felt that there's some misplaced focus going on for companies.
Two that stood out to me are Sentry and Vercel. Both have released code review tools recently and both feel misplaced. I can definitely see why they thought they could expand with that type of product offering but I just don't see a benefit over their competition. We have GH copilot natively available on all our PRs, it does a great job, integrates very well with the PR comment system, and is cheap (free with our current usage patterns). GH and other source control services are well placed to have first-class code review functionality baked into their PR tooling.
It's not really clear to me what Sentry/Vercel are offering beyond what copilot does and in my brief testing of them didn't see noticeable difference in quality or DX. Feels like they're fighting an uphill battle from day one with the product choice and are ultimately limited on DX by how deeply GH and other source control service allow them to integrate.
What I would love to see from Vercel, which they feel very well placed to offer, is AI powered QA. They already control the preview environments being deployed to for each PR, they have a feedback system in place with their Vercel toolbar comments, so they "just" need to tie those together with an agentic QA system. A much loftier goal of course but a differentiator and something I'm sure a lot of teams would pay top dollar for if it works well.