Comment on Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

<- View Parent
SuspciousCarrot78@lemmy.world ⁨1⁩ ⁨day⁩ ago

Ok, happy to play ball on that.

Replying in specific: “Carefully worded questions”; clear communication isn’t cheating. You’d mark a student down for misreading an ambiguous question, not for answering a clear one correctly, right?

Re: worse answers. Tell you what. I’m happy to yeet some unrelated questions at it if you’d like and let’s see what it does. My setup isn’t bog standard - what’ll likely happen is it’ll say “this question isn’t grounded in the facts given, so I’ll answer from my prior knowledge.” I designed my system to either answer it of fail loudly.

Want to give it a shot? I’ll ground it just to those facts, fair and square. Throw me a question and we’ll see what happens. Deal?

The context window point is interesting and probably partially true. But working memory interference affects humans too. It’s just what happens to any bounded system under load. Not a gotcha, just a Tuesday AM with 2 cups of coffee.

The training data argument is the most interesting thing you’ve said, but I think you’re arguing my point for me. You’re acknowledging the model has absorbed the relevant knowledge - you’re just objecting that it needed activating explicitly.

That’s just priming the pump. You don’t sit an exam without reviewing the material first. Activating relevant knowledge before a task isn’t a workaround for reasoning, it’s a precondition for it.

source
Sort:hotnewtop