Can you explain why this is the case? I don't understand why.

wolttam · 2025-08-06T04:50:20 1754455820

I'll try! Let's consider a tree blowing in the wind:

To classically render this in any realistic fashion, it quickly gets complex. Between the physics simulation (rather involved) and the number of triangles (trees have many branches and leaves), you're going to be doing a lot of math.

I'll emphasize "realistic" - sure, we can real-time render trees in 2025 that look.. ok. However, take more than a second to glance at it and you will quickly start to see where we have made compromises to the tree's fidelity to ensure it renders at an adequate speed on contemporary hardware.

Now consider a world model trained on enough tree footage that it has gained an "intuition" about how trees look and behave. This world model doesn't need to actually simulate the entire tree to get it to look decent.. it can instead directly output the pixels that "make sense". Much like a human brain can "simulate" the movement of an object through space without expending much energy - we do it via prediction based on a lot of training data, not by accurately crunching a bunch of numbers.

That's just one tree, though - the real world has a lot of fidelity to it. Fidelity that would be extremely expensive to simulate to get a properly realistic output on the other side.

Instead we can use these models which have an intuition for how things aught to look. They can skip the simulation and just give you the end result that looks passable because it's based on predictions informed by real-world data.

imadr · 2025-08-06T14:13:09 1754489589

Don't you think a sufficiently advanced model will end up emulating what normal 3D engines already do mathematically? At least for the rendering part, I don't see you can "compress" the meaning behind light interaction without ending up with a somewhat latent representation of the rendering equation

gond · 2025-08-06T18:33:32 1754505212

If I am not mistaken, we are already past that. The pixel, or token, gets probability-predicted in real time. The complete, shaded pixel, as you will, gets computed ‘at once’ instead of layers of simulation. That’s the LLM’s core mechanism.

If the mechanism allows for predicting how the next pixel will look like, which includes the lighting equation, then there is no need anymore for a light simulation.

Would also like to know how Genie works. Maybe some parts get indeed already simulated in a hybrid approach.

imadr · 2025-08-07T18:34:35 1754591675

The model has multiple layers which are basically a giant non-linear equation to predict the final shaded pixel, I don't see how it's inherently difference from a shader outputing a pixel "at once".

Correct me if I'm wrong, but I don't see how you can simulate a PBR pixel without doing ANY pbr computation whatsoever.

For example one could imagine a very simple program computing sin(x), or a giant multi-layered model that does the same, wouldn't it just be a latent, more-or-less compressed version of sin(x)?

maerF0x0 · 2025-08-06T18:15:37 1754504137

in this case I assume it would be taken from motion / pixels of actual trees blowing in the wind. Which does serve up the challenge of like how does dust blow on a hypothetical gameworld alien planet?