Certainly: https://arxiv.org/abs/2306.05720 It's a very interesting paper. "Even...

Certainly: https://arxiv.org/abs/2306.05720

It's a very interesting paper.

"Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process−well before a human can easily make sense of the noisy images."