Comment

Comment on Oracle Layoffs: Tech giant to slash 30,000 jobs as banks pull out from financing AI data centres | Company Business News

<- View Parent

CileTheSane@lemmy.ca ⁨2⁩ ⁨months⁩ ago

I think you are underestimating how accurate LLMs are because you probably don’t use them much, and only see there mistakes posted for memes. No one’s going to post the 99 times an LLM gives the correct answer, but the one time it says to put glue on pizza it’s going to go viral. So if your only view on LLM output is from posts, you’re going to think it’s way worse than it is.

And look at what is on my feed just this morning: lemmy.world/post/44099386

It’s not just that LLMs are shit. It’s that people trust them way too much and are shocked when the predictable happens.

Even if you mark it down for incorrect answers it’s still going to beat most people. An LLM can score in the 90th percentile in the SAT, and around the 80th percentile in the LSAT.

And of course the AI bro goes for the “vibes” argument. You can’t just state that as true without providing a source. Or did AI tell you it was true?

For example: fewer than 10% of tested AIs consistently properly answered that you need to drive to a car wash in order to wash your car: opper.ai/blog/car-wash-test

That’s a question so far below anything on the SAT or LSAT and 90% of LLMs can’t even get that right.

If you’re doubting my percentages on the accuracy of LLMs I’d encourage you to test them yourself.

I’ve tried using LLMs. I don’t use them for research, because why the fuck would I? Better, more efficient tools already exist for that. When I had something that a search engine can’t help me with and LLMs are apparently “good at” it immediately proved itself to be worthless.

source

Sort:hotnew top

Not_mikey@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
Here’s the source it’s from open AI but it is peer reviewed. Here’s another source that uses it as a baseline to compare the relative scores and according to the tables in 2023 it got a 610, putting it around the 75th percentile, and that’s just for math which the open AI study showed it did about 5% worse then it’s average so ~80th percentile for a total score. Again this is for students who are usually more prepared for the SAT than the general population, so it’s still probably in the 90th percentile for the general population.

Again for the car wash example that is not declaritive knowledge, like the pizza glue that is knowledge derived from experience and reason which I’ve said that LLMs aren’t the best at. The fact that they had to make a riddle for the AI to trip it up if anything shows how good it is. If it was as bad as you say it is then anyone could easily trip it up and get it to give a wrong answer and a study like that wouldn’t be relevant. Seriously if you think the LLM is so inaccurate, come up with your own test to stump it, it should be easy by the way you talk about them.

source
- CileTheSane@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  
  The fact that they had to make a riddle for the AI to trip it up
  
  “I want to take my car to the car wash, should I walk or drive” is not a riddle. It requests basic understanding of what is being asked.
  
  source