Research into the internals of the networks have shown that they figure out the ...

Chirono · on Nov 28, 2023

Interesting! Do you have a link to that research?

l33tman · on Nov 28, 2023

Certainly: https://arxiv.org/abs/2306.05720

It's a very interesting paper.

"Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process−well before a human can easily make sense of the noisy images."

nojvek · on Nov 28, 2023

What does 2.5D mean?

l33tman · on Nov 28, 2023

You usually say 2.5D when it's a 3D but only from a single vantage point with no info of the back-facing side of objects. Like the representation you get from a depth-sensor on a mobile phone, or when trying to extract depth from a single photo.

shsbdncudx · on Nov 28, 2023

It means you should be worried about the guy she told you not to worry about