“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

⁨151⁩ ⁨likes⁩

Submitted ⁨⁨7⁩ ⁨months⁩ ago⁩ by ⁨misk@sopuli.xyz⁩ to ⁨technology@lemmy.world⁩

https://arstechnica.com/ai/2025/02/its-a-lemon-openais-largest-ai-model-ever-arrives-to-mixed-reviews/

source

Comments

Sort:hotnew top

simple@lemm.ee ⁨7⁩ ⁨months⁩ ago
With this, OpenAI is officially starting to crack. They’ve been promising a lot and not delivering, the only reason they would push out GPT4.5 even though it’s worse and more expensive than the competition is because the investors are starting to get mad.

source
- balder1991@lemmy.world ⁨7⁩ ⁨months⁩ ago
  Who wouldn’t be mad considering the amount of money OpenAI is burning.
  
  source
- Squizzy@lemmy.world ⁨7⁩ ⁨months⁩ ago
  Thry also had poor video generatiin
  
  source
Catoblepas@lemmy.blahaj.zone ⁨7⁩ ⁨months⁩ ago
I’m sure turning on a few more nuclear plants to power shoveling an ever larger body of AI slop-contaminated text into the world’s most expensive plagiarism machine will fix it!

source
WalnutLum@lemmy.ml ⁨7⁩ ⁨months⁩ ago
I think most ML experts (that weren’t being paid out the wazoo for saying otherwise) have been saying we’re on the tail end of the LLM technology sigma curve. (Basically treating an LLM as a stochastic index, the actual measure of training algorithm quality is query accuracy per training datum)

Even with deepseek’s methodology, you see smaller and smaller returns on training input.

source
- MDCCCLV@lemmy.ca ⁨7⁩ ⁨months⁩ ago
  At this point, it is useful for doing some specific things so the way to make it great is making it cheap and accessible. Being able to run it locally would be way more useful.
  
  source
  - makyo@lemmy.world ⁨7⁩ ⁨months⁩ ago
    100% this. Wouldn’t it be something if they weren’t overtly running their companies to replace all of us? If feel like focusing instead on creating great personal assistants that make our lives easier in various ways would get a lot of support from the public.
    
    And don’t get me wrong, these LLMs are great at helping people already but that’s definitely not the obvious end goal of OpenAI or any of the others.
    
    source
  - dustyData@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Sure, but then what would they do with their billions of dollars data center plugged into a nuclear power plant?
    
    source
    -> View More Comments
  - ugjka@lemmy.world ⁨7⁩ ⁨months⁩ ago
    [deleted]
    source
    -> View More Comments
Grandwolf319@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
Is it because they used data from after chat GPT was released?

source
homesweethomeMrL@lemmy.world ⁨7⁩ ⁨months⁩ ago
That’s bad. Mmmmmkay.

source
obbeel@lemmy.eco.br ⁨7⁩ ⁨months⁩ ago
That was kind of expected, but Claude isn’t that good either.

source
- thatsnothowyoudoit@lemmy.ca ⁨7⁩ ⁨months⁩ ago
  I think that depends on what you’re doing. I find Claude miles ahead of the pack in practical, but fairly nuanced coding issues - particularly in use as a paired programmer with Strongly Typed FP patterns.
  
  And their new CLI client is pretty decent - it seems to really take advantage of the hybrid CoT/standard auto-switching model advantage Claude now has.
  
  I don’t use it often anymore but when I reach for a model first for coding - it’s Claude. It’s the mostly likely to be able to grasp the core architectural patterns in a codebase (like a consistent monadic structure for error handling or consistently well-defined architectural layers).
  
  I just recently cancelled my one month trial of Gemini - it was pretty useless; easy to get stuck in a dumb loop even with project files as context.
  
  And GPT-4/o1/o3 seems to really suck at being prescriptive - often providing walls of multiple solutions that all somehow narrowly missing the plot - even with tons of context.
  
  That said Claude sucks - SUCKS - at statistics - being completely unreliable where GPT-4 is often pretty good and provides code (Python) for verification.
  
  source
humanspiral@lemmy.ca ⁨7⁩ ⁨months⁩ ago
Not an expert in the field, but OP seems to be using relevant metrics to criticize model cost/performance.

One reason to dislike OpenAI is its “national security ties”. It can probably get the “wrong customers” paying whatever expense it is.

source