Comment on It's Breathtaking How Fast AI Is Screwing Up the Education System

<- View Parent
pinkapple@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago

And here’s experimental verification that humans lack formal reasoning when sentences don’t precisely spell it out for them: all the models they tested except chatGPT4 and o1 variants are from 27B and below, all the way to Phi-3 which is an SLM, a small language model with only 3.8B parameters. ChatGPT4 has 1.8T parameters.

1.8 trillion > 3.8 billion

ChatGPT4’s performance difference (accuracy drop) with regular benchmarks was a whooping -0.3 versus Mistral 7B -9.2 drop.

Yes there were massive differences. No, they didn’t show significance because they barely did any real stats. The models I suggested you try for yourself are not included in the test and the ones they did use are known to have significant limitations. Intellectual honesty would require reading the actual “study” though instead of doubling down.

Maybe consider the possibility that a. STEMlords in general may know how to do benchmarks but not cognitive testing type testing or how to use statistical methods from that field b. this study being an example of a few “I’m just messing around trying to confuse LLMs with sneaky prompts instead of doing real research because I need a publication without work” type of study, equivalent to students making chatGPT do their homework c. 3.8B models = the size in bytes is between 1.8 and 2.2 gigabytes d. not that “peer review” is required for criticism lol but uh, that’s a preprint on arxiv, the “study” itself hasn’t been peer reviewed or properly published anywhere (how many months are there between October 2024 to May 2025?) e. showing some qualitative difference between quantitatively different things without showing p and using weights is garbage statistics f. you can try the experiment yourself because the models I suggested have visible Chain of Thought and you’ll see if and over what they get confused about g. when there are graded performance differences with several models reliably not getting confused at least more than half the time but you say “fundamentally can’t reason” you may be fundamentally misunderstanding what the word means

Need more clarifications instead of reading the study or performing basic fun experiments? At least be intellectually curious or something.

source
Sort:hotnewtop