I know the main post has been getting a lot of reaction, but this page absolutely blew me away. The results are striking.
The robot examples are very underwhelming, but the people and background people are all very well done, and at a level much better than most static image diffusion models produce. Generating the same people as the interact with objects is also not something I expected a model like this to do well so soon.
The robot examples are very underwhelming, but the people and background people are all very well done, and at a level much better than most static image diffusion models produce. Generating the same people as the interact with objects is also not something I expected a model like this to do well so soon.