I’m not sure what you’re reference. Imagegen models are not much different, especially now that they’re going transformers/MoE. Video gen models are chunky, but more rarely used, and they’re usually much smaller parameter counts.
Basically anything else machine learning is an order of magnitude less energy, at least