GPT-4.5

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨cyrano@lemmy.dbzer0.com⁩ to ⁨technology@lemmy.world⁩

Comments

Comments

Sort:hotnew top

BetaDoggo_@lemmy.world ⁨1⁩ ⁨year⁩ ago
In their human choice benchmarks it was only chosen 59% of the time compared to 4o. That’s a 15-20x cost increase for 9% difference.

source
cygnus@lemmy.ca ⁨1⁩ ⁨year⁩ ago
Those charts are hilarious: wow, it gives the right answer 62.5% of the time and only makes up completely false answers 37.1% of the time! It’s like Russian roulette, but worse!

source
- olympicyes@lemmy.world ⁨1⁩ ⁨year⁩ ago
  If you play Russian roulette with two bullets like a real man, then this model is about the same outcome!
  
  source
- regrub@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Surely, people won’t use the slop generator in applications where being correct is important, right?
  
  source