Comment

Comment on AI Computing on Pace to Consume More Energy Than India, Arm Says

Improving the models doesn’t seem to work: arxiv.org/abs/2404.04125?

We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend.

It’s taking exponentially more data to get better results, and therefore, exponentially more energy. Even if something like analog training chips reduce energy usage ten fold, the exponential curve will just catch up again. Not only that, but you have to gather that much more data, and while the Internet is a vast datastore, the AI models have already absorbed much of it.

The implication is that the models are about as good as they will be without more fundamental breakthroughs. The thing about breakthroughs like that is that they could happen tomorrow, they could happen in 10 years, they could happen in 1000 years, or they could happen never.

Fermat’s Last Theorem remained an open problem for 358 years. Squaring the Circle remained open for over 2000 years. The Riemann Hypothesis has remained unsolved after more than 150 years. These things sometimes sit there for a long, long time, and not for lack of smart people trying to solve them.

source

Sort:hotnew top