The vast majority of energy is already used at inference time. This gap is only ...

The vast majority of energy is already used at inference time. This gap is only going to widen as the number of applications increases and people ask their LLMs to do more complicated tasks.

If training costs go down, I expect models will be trained on more video, so training energy usage will not decrease either.