Comment

Comment on Do you think Google execs keep a secret un-enshittified version of their search engine and LLM?

TropicalDingdong@lemmy.world ⁨4⁩ ⁨months⁩ ago

This is really the fear we should all have. And I’ve wondered about this specifically in the case of Thiel, who seems quite off their rocker.

Some things we know.

Architectural, the underpinnings of LLM’s existed long before the modern crops. Attention is all you need is basic reading these days; Google literally invented transformers, but failed to create the first llm. This is important.

Modern LLM’s came through basically two aspects of scaling a transformer. First, massively scale the transformer. Second, massively scale the training dataset. This is what OpenAI did. What google missed was that the emergent properties of networks change with scale. But just scaling a large neural network alone isn’t enough. You need enough data to allow it to converge on interesting and useful features.

On the first part, of scaling the network. This is basically what we’ve done so far, along with some cleverness around how training data is presented, to create improvements to existing generative models. Larger models, are basically better models. There is some nuance here but not much. There have been no new architecural improvements that have resulted in the kind of order of magnitude scaling in improvement we saw in the jump from lstm/GAN days, to transformers.

Now what we also know, is that its incredibly opaque what is actually presented to the public. Open source models, some are in the range of 100’s of billions of parameters Most aren’t that big. I have quen3-vl on my local machine, its 33 billion parameters. I think I’ve seen some 400b parameter models in the open source world, but I haven’t bothered downloading them because I can’t run them. We don’t actually know how many billion parameters models like Opus-4.5 or whatever shit stack OpenAI is sending out these days. Its probably in the range of 200b-500b, which we can infer based on the upper limits of what can fit on the most advanced server grade hardware. Beyond that, its MoE, multiple models on multiple GPU’s conferring results.

What we haven’t seen is any kind of stepwise, order of magnitude improvement since the 3.5-4 jump open AI made a few years ago. Its been very… iterative, which is to say, underwhelming, since 2023. Its very clear that an upper limit was reached and most of the improvements have been around QoL and nice engineering, but nothing has fundamentally or noticeably improved in terms of the underlying quality of these models. That is in and of itself interesting and there could be several explanations of this.

Getting very far beyond this takes us beyond the hardware limitations of even the most advanced manufacturing we currently have available to us. I think the most a blackwell card has is ~288GB of VRAM? Now it might be at this scale we just don’t have hardware available to even try and look over the hedge to see what or how a larger model might perform. This is one explanation: we hit the memory limits of hardware and we might not see a major performance improvement until we get into the TB range of memory on GPU’s.

Another explanation, could be that at the consumer level, they stopped throwing more compute resources at the problem. Remember the MoE thing? Well these companies, allegedly, are supposed to make money. Its possible that they just stopped throwing more resources at their product lines, and that more MoE does actually result in better performance.

In the first scenario I outlined, executives would be limited to the same useful, but kinda-crappy LLM’s we all have access to. In the second scenario, executives might have access to super powered, high MoE versions.

If the second scenario is true and when highly clustered, llm’s can demonstrate an additional stepwise performance improvement, then we’re already fucked. But if this were the case, its not like western companies have a monopoly on GPUs or even models. And we’re not seeing that kind of massive performance bump elsewhere, so its likely that MoE also has its limits and they’ve been reached at this point.

source

Sort:hotnew top

partial_accumen@lemmy.world ⁨4⁩ ⁨months⁩ ago

Its also possible we’ve reached the limits of the training data.

This is my thinking too. I don’t know how to solve the problem either because datasets created after about 2022 likely are polluted with LLM results baked in. With even a 95% precision that means 5% hallucination baked into the dataset. I can’t imagine enough grounding is possible to mitigate that. As the years go forward the problem only gets worse because more LLM results will be fed back in as training data.

source
- TropicalDingdong@lemmy.world ⁨4⁩ ⁨months⁩ ago
  I mean thats possible, but I’m not as worried about that. Yes it would make future models worse. But its also entirely plausible to just cultivate a better dataset. And even small datasets can be used to make models that are far better at specific tasks than an any generalist llm. If better data is better then the solution is simple: use human labor to cultivate a highly curated high quality dataset. I mean its what we’ve been doing for decades in ML.
  
  I think the bigger issue is that transformers are incredibly inefficient about their use of data. How big of a corpus do you need to feed into an llm to get it to solve a y =mx+b problem? Compare that to a simple neural network or a random forest. For domain specific tasks they’re absurdly inefficient. I do think we’ll see architectural improvements, and while the consequences of improvements has been non-linear, the improvements themselves have been fairly, well, linear.
  
  Before transformers we basically had GAN’s and LSTM’s as the latest and greatest. And before that UNET was the latest and greatest (and I still go back to, often), and before that basic NN’s and random forest. I do think we’ll get some stepwise improvements to machine learning, and we’re about due for some. But its not going to be tittering at the edges. Its going to be something different.
  
  The only thing that I’m truly worried about is that if, even if its unlikely, if you can just 10x the size of an existing transformer (say from 500 billion parameters to 5 trillion, something you would need like a terabyte of vram to even process), if that results in totally new characteristics, in the same way that scaling from 100 million parameters to 10 billion resulted in something that, apparently, understood the rules of language. There are real land mines out there that none of us as individuals have the ability to avoid. But the “poisoning” of the data? If history tells us anything, its that if a capitalist thinks it might be profitable, they’ll throw any amount of human suffering at it to try and accomplish it.
  
  source
- SubArcticTundra@lemmy.ml ⁨4⁩ ⁨months⁩ ago
  If model collapse is such an issue for LLMs, then why are humans resistant to it? We are largely trained on output created by other humans.
  
  source
  - AwesomeLowlander@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    Are we? Antivax, anti science BS is largely due to Russia poisoning our dataset.
    
    source
magiccupcake@lemmy.world ⁨4⁩ ⁨months⁩ ago
Don’t forget the fundamental scaling properties of llms, that openai even used as the basis for strategy to make chat gpt 3.5.

But basically llm performance is logarithmic. It’s easier to get rapid improvements early on. But at later points like we are now require exponentially more compute, training data, and model sizes to get now small level of improvements.

Even if we get a 10x in compute, model size, and training data (which is fundamentally finite), the improvements aren’t going to be groundbreaking or solve any of the inherent limitations of the technology.

source
- TropicalDingdong@lemmy.world ⁨4⁩ ⁨months⁩ ago
  💯
  
  Scaling is a quick and dirty way to get performance improvements. But there is no guarantee that we get any more interesting behavior or don’t get like, wildly more interesting behavior with an additional 10x’ing. The fact is we simply don’t know what emergent properties might exist at a networks size we simply phsyically can’t scale to right now. Its important to not assume things like “its just going to be diminishing returns”, because while that is most likely the case, precisely this thinking is why google wasn’t the first to make an LLM, even though they had discovered/ invented the underlying technology. Yet another 10x scaling didnt result in just diminishing returns, but fundamentally new network properties.
  
  And that principal holds across networked systems (social networks, communication networks, fungal and cellular communication networks). We truly do not know what will result from scaling the complexity of the network. It could move the needle from 95% to 96.5% accuracy. Or it could move it to a range of accuracy that isn’t measurable in human terms (its literally more accurate than we have the capability of validating). Or it could go from 95% to 94%. We simply don’t know.
  
  source
- dyathinkhesaurus@lemmy.world ⁨4⁩ ⁨months⁩ ago
  Diminishing returns
  
  source
salacious_coaster@feddit.online ⁨4⁩ ⁨months⁩ ago
Thanks for the detailed answer!

source
SubArcticTundra@lemmy.ml ⁨4⁩ ⁨months⁩ ago
If a small group of people get to outcompete the rest of society thanks to having exclusive access to more powerful LLMs, I’m sure (🤞) that would lead to a lot of unrest. Not only from us plebs, but also from the rest of business. I can see it leading to some lawsuits concluding that AI advancement must be shared (if at a fee) with the public.

source