Comment on Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi...

blue_zephyr@lemmy.world ⁨1⁩ ⁨year⁩ ago

This paper is pretty unbelievable to me in the literal sense. First of all they couldn’t even bother to check for simple spelling mistakes. Second, all they’re doing is asking whether a number is prime or not and then extrapolating the results to be representative of solving math problems.

But most importantly I don’t believe for a second that the same model with a few adjustments over a 3 month period would completely flip performance on any representative task. I suspect there’s something seriously wrong with how they probe the models.

source
Sort:hotnewtop