I have wondered if with these tests it'll reach a point where online models chea...

simonw · 2026-02-14T22:40:26 1771108826

This Deep Think one was so good that I did get suspicious that maybe it was at least rendering the SVG to an image and then "looking" at the image and tweaking it over a few iterations.

But the reasoning trace doesn't hint at that and looks legit to me: https://gist.github.com/simonw/7e317ebb5cf8e75b2fcec4d0694a8...

I also asked Deep Think what tools it has access to and it has Python and Bash but no internet access, and as far as I can tell that environment doesn't have any libraries or tools installed that can render an SVG to an image format that it could view.

taberiand · 2026-02-14T20:51:02 1771102262

Is that cheating, or is that just working smarter not harder?

Springtime · 2026-02-14T21:04:13 1771103053

The interesting aspect of the ongoing tests I feel is seeing how models can plan out an image directly using SVG primatives solely through reasoning (code-to-code). If they have a reference then it's a different type of challenge (optimizing for a trace).