Hacker Newsnew | past | comments | ask | show | jobs | submit | jberthom's commentslogin

interesting, which model were you using for the vision part? In my experience Claude Sonnet and Opus handle UI screenshots reasonably well, not perfect but good enough that the agent can catch obvious layout issues and iterate. Definitely not at the “pixel perfect design implementation” stage yet though. But for testing features it's ok. The goal is for the agent to test that the UX/UI flow works, not that one pixel is correctly aligned with others in that case

agent-browser runs locally (it’s a Rust CLI + Node daemon on your machine), so there’s no cloud dependency on Vercel, it’s just built by the Vercel Labs team. Everything stays local :)

Simon’s tools are really great. Showboat is more for static screenshots though. ProofShot is the full session: recording, error capture, action timeline, PR upload. Different scope i'd say.

The agent drives interactions through proofshot exec — clicks, typing, navigation and each action gets logged with timestamps synced to the video. So in the viewer you can scrub through and click on action markers to jump to specific moments. It captures what happened during interaction, not just what the page looked like at rest. I had recordings where the agent struggled (for instance when having to click toggle buttons). It was fascinating to watch, the agent just tried again and again like a toddler figuring out how to use a keyboard and after 3 tries figured it out on his/her own (trying not to misgender the babies of future AGI).

ah feel your pain.. Codex interaction is exactly the pain point. “I fixed it” / “no you didn’t” five times in a row, you feel gaslighted by your own agent in a way. That’s the loop I wanted to kill. I didnt' know about Mozilla screenshot regression actually

Thanks! Yeah the before/after PR thing is exactly what proofshot pr is built for.

DevTools MCP is great for live debugging in the moment. ProofShot is more about generating a proof bundle after the fact, something you can review on a PR without having been there when the agent ran. Different use cases I think.

yes as saintfire said :)

ProofShot is just a CLI, not tied to any IDE. If you’re in Antigravity or VSCode and their built-in preview works for you, great. This is for people using Claude Code, Codex, or any terminal-based agent where there’s no IDE doing it for you. The main thing is really the PR artifact workflow - the agent records proof, you review it async on the PR.


Yes for now focused on web but accessibility tree dumps seems like a good alternative to screenshots for native apps. For web, agent-browser already uses compact element refs, but for mobile the a11y approach could be way more efficient. Would you be open to sharing more about how you set that up?

Yes agree. Web only for now since it runs on headless Chromium. Desktop and mobile are the #1 request though. For mobile the path would be driving an iOS Simulator or Android emulator. For native desktop, probably accessibility APIs or OS-level screenshots. Definitely on my radar, will see if anyone wants to contribute since I am doing this on my free time.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: