Is there anyway to make it use less at it gets more advanced or will there be huge power plants just dedicated to AI all over the world soon?
Imagine someone said “make a machine that can peel an orange”. You have a thousand shoeboxes full of Meccano. You give them a shake and tip out the contents and check which of the resulting scrap piles can best peel an orange. Odds are none of them can, so you repeat again. And again. And again. Eventually, one of boxes produces a contraption that can kinda, maybe, sorta touch the orange. That’s the best you’ve got so you copy bits of it into the other 999 shoeboxes and give them another shake. It’ll probably produce worse outcomes, but maybe one of them will be slightly better still and that becomes the basis of the next generation. You do this a trillion times and eventually you get a machine that can peel an orange. You don’t know if it can peel an egg, or a banana, or even how it peels an orange because it wasn’t designed but born through inefficient, random, brute-force evolution.
Now imagine that it’s not a thousand shoeboxes, but a billion. And instead of shoeboxes, it’s files containing hundred gigabytes of utterly incomprehensible abstract connections between meaningless data points. And instead of one a few generations a day, it’s a thousand a second. And instead of “peel an orange” it’s “sustain a facsimile of sentience capable of instantly understanding arbitrary, highly abstracted knowledge and generating creative works such to a standard approaching being indistinguishable from humanity such that it can manipulate those that it interacts with to support the views of a billionaire nazi nepo-baby”. When someone asks for an LLM to generate a picture of a fucking cat astronaut or whatever, the unholy mess of scraps that behaves like a mind spits out a result and no-one knows how it does it aside from broad-stroke generalisation. The iteration that gets the most thumbs up from it’s users gets to be the basis of the next generation, the rest die, millions of times a day.
What I just described is NEET algorithms, which are pretty primitive by modern standards, but it’s a flavour of what’s going on.
ImplyingImplications@lemmy.ca 2 days ago
It’s mostly the training/machine learning that is power hungry.
AI is essentially a giant equation that is generated via machine learning. You give it a prompt with an expected answer, it gets run through the equation, and you get an output. That output gets an error score based on how far it is from the expected answer. The variables of the equation are then modified so that the prompt will lead to a better output (one with a lower error).
The issue is that current AI models have billions of variables and will be trained on billions of prompts. Each variable will be tuned based on each prompt. That’s billions to the power of billions of calculations. It takes a while. AI researchers are of course looking for ways to speed up this process, but so far it’s mostly come down to dividing up these billions of calculations over millions of computers. Powering millions of computers is where the energy costs come from.
Unless AI models can be trained in a way that doesn’t require running a billion squared calculations, they’re only going to get more power hungry.
neukenindekeuken@sh.itjust.works 2 days ago
This is a pretty great explanation/simplification.
I’ll add that because the calculations rely on floating point math in many cases, graphics chips do most of the heavy processing since they were already designed for this pipeline in mind with video games.
That means there’s a lot of power hungry graphics chips running in these data centers. It’s also why NVidia stock is so insane.
percent@infosec.pub 1 day ago
It’s kinda interesting how the most power-consuming uses of graphics chips — crypto and AI/ML — have nothing to do with graphics.
(Except for AI-generated graphics, I suppose)
ApatheticCactus@lemmy.world 1 day ago
Would AI inferencing, or training be better suited to a quantum computer? I recall thouse not being great at conventional math, but massively accelerates computations that sounded similar to machine learning.
ImplyingImplications@lemmy.ca 1 day ago
My understanding of quantum computers is that they’re great a brute forcing stuff, but machine learning is just a lot of calculations, not brute forcing.
If you want to know the square root of 25, you don’t need to brute force it. There’s a direct way to calculate the answer and traditional computers can do it just fine. It’s still going to take a long time if you need to calculate the square root of a billion numbers.
That’s basically machine learning. The individual calculations aren’t difficult, there’s just a lot to calculate. However, if you have 2 computers doing the calculations, it’ll take half the time. It’ll take even less time if you fill a data center with a cluster of 100,000 GPUs.