Comment on AI is learning to lie, scheme, and threaten its creators
alaphic@lemmy.world 17 hours agoThis was honestly what I was more inclined to believe, though I also know that I don’t have enough information about the subject to have a truly informed opinion… It was my understanding, however, that despite all their grandiose claims aren’t LLMs (at least, our current models anyway) essentially ‘ranked choice’ dialogue trees, sort of, where the next word is determined by statistical likelihood of X word being next based on the input and what material it has been trained on? Or am I wrong?
ImplyingImplications@lemmy.ca 16 hours ago
LLMs are essentially that. They predict the next words based on the previous words. It was noticed that the quality of a prompt had an effect on the quality of an LLM’s output. Output could be improved if prompts were better. Why not use an LLM to generate good prompts? Welcome to “reasoning” models.
Instead of the LLM taking the user’s prompt and generating the output directly, reasoning models will generate intermediate prompts for itself based on the user’s inital prompt and the models own intermediate answers. They call it “chain of thought” or CoT and it results in a better final output than LLMs that don’t use this technique.
If you ask a reasoning LLM to convince a user to take medication that has harmful side effects, and review the chain of thought, you might see that it prompts itself to ensure the final answer doesn’t mention any negative side effects, as that would be less convincing. People are writing about how this is “lying” since the LLM is prompting itself to “hide” information even when the user hasn’t explicitly asked it to.
However, this only happens in really contrived examples where the inital prompt is essentially asking the LLM to lie without explicitly saying it.