Competitors like Midjourney and Stablediffusion already allow you to re-use an image seed, which makes it much easier to persist style and character across images.
3D can often be inferred by 2D data, 3D training data could also be generated, etc. Think how fast the space has moved just in the last 3 years and extrapolate from that, don't focus too much on today's shortcomings.
But then there are predictions from the 50s that we’d have flying cars by the year 2000.
Sometimes tech hits a plateau.
Animated movies make tons of money, so there’s definitely motivation to make their production faster. I just think the complexities are so intricate that I wouldn’t be surprised if AI-generated animation still seems “not quite right” in the near future.
3D can often be inferred by 2D data, 3D training data could also be generated, etc. Think how fast the space has moved just in the last 3 years and extrapolate from that, don't focus too much on today's shortcomings.