Has anyone ever considered that a chain of reasoning can actually change the output? Because that is fed back into the input prompt. That’s great for math and logic problems, but I don’t think I’d trust the alignment checks.
Reasoning models don't always say what they think.
Submitted 1 month ago by Tea@programming.dev to technology@lemmy.world
https://www.anthropic.com/research/reasoning-models-dont-say-think
Comments
MagicShel@lemmy.zip 1 month ago
DeathsEmbrace@lemm.ee 5 weeks ago
It’s basically using a reference point and they want to make it sound fancier.
Gibibit@lemmy.world 5 weeks ago
So chain of thought is an awful experiment that doesn’t let you know how an AI reasons. Instead of admitting this, AI researchers anthropomorphize yet another test result and turn it into the model hiding their thought process from you. Whatever.
A_A@lemmy.world 5 weeks ago
i like this part :
There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circumstances where a model actively hides aspects of its thought process from the user.
cronenthal@discuss.tchncs.de 1 month ago
Because they do not think.
A_A@lemmy.world 5 weeks ago
Even people don’t always say what they think … and this applies to the few ones who do.