Fundamentally no, linear progress requires exponential resources. The below article is about AGI but transformer based models will not benefit from just more grunt. We’re at the software stage of the problem now. But that doesn’t sign fat checks, so the big companies are incentivized to print money by developing more hardware.
https://timdettmers.com/2025/12/10/why-agi-will-not-happen/
Also the industry is running out of training data
https://arxiv.org/html/2602.21462v1
What we need are more efficient models, and better harnessing. Or a different approach, reinforced learning applied to RNNs that use transformers has been showing promise.
AliasAKA@lemmy.world 1 day ago
Current models are speculated at 700 billion parameters plus. At 32 bit precision (half float), that’s 2.8TB of RAM per model, or about 10 of these units. There are ways to lower it, but if you’re trying to run full precision (say for training) you’d use over 2x this, something like maybe 4x depending on how you store gradients and updates, and then running full precision I’d reckon at 32bit probably. Possible I suppose they train at 32bit but I’d be kind of surprised.
in_my_honest_opinion@piefed.social 1 day ago
Sure, but giant context models are still more prone to hallucination and reinforcing confidence loops where they keep spitting out the same wrong result a different way.
AliasAKA@lemmy.world 21 hours ago
Sorry, I’m not saying that’s a good thing. It’s not just the context that’s expanding, but the parameter of the base model. I’m saying at some point you just have saved a compressed version of the majority of the content (we’re already kind of there) and you’d be able to decompress it even more losslessly. This doesn’t make it more useful for anything other than recreating copyrighted works.
in_my_honest_opinion@piefed.social 18 hours ago
Ah I see, however you do bring up another point. I really think we need a true collection of experts able to communicate without the need for natural language and then a “translation” layer to output natural language or images to the user. The larger parameters would allow the injection of experts into the pipeline.
Thanks for the clarification, and also for the idea. I think one thing we can all agree on is that the field is expanding faster than any billionaire or company understands.