I mean if they fix specific reasoning test answers (like the strawberry one) this doesn’t actually make reasoning better tho. It just optimizes for benchmarks
I mean if they fix specific reasoning test answers (like the strawberry one) this doesn’t actually make reasoning better tho. It just optimizes for benchmarks