Comment on AI is learning to lie, scheme, and threaten its creators

<- View Parent
ImplyingImplications@lemmy.ca ⁨5⁩ ⁨days⁩ ago

LLMs are essentially that. They predict the next words based on the previous words. It was noticed that the quality of a prompt had an effect on the quality of an LLM’s output. Output could be improved if prompts were better. Why not use an LLM to generate good prompts? Welcome to “reasoning” models.

Instead of the LLM taking the user’s prompt and generating the output directly, reasoning models will generate intermediate prompts for itself based on the user’s inital prompt and the models own intermediate answers. They call it “chain of thought” or CoT and it results in a better final output than LLMs that don’t use this technique.

If you ask a reasoning LLM to convince a user to take medication that has harmful side effects, and review the chain of thought, you might see that it prompts itself to ensure the final answer doesn’t mention any negative side effects, as that would be less convincing. People are writing about how this is “lying” since the LLM is prompting itself to “hide” information even when the user hasn’t explicitly asked it to.

However, this only happens in really contrived examples where the inital prompt is essentially asking the LLM to lie without explicitly saying it.

source
Sort:hotnewtop