Comment on OpenAI's 'Jailbreak-Proof' New Models? Hacked on Day One

DarkCloud@lemmy.world ⁨1⁩ ⁨day⁩ ago

I mean, it’s fundamental to LLM technology that they listen to user inputs. Those inputs are probablistic in terms of their effects on outputs… So you’re always going to be able to manipulate the outputs, which is kind of the premise of the technology.

It will always be prone to that sort of jailbreak. Feed it vocab, it outputs vocab. Feed it permissive vocab, it outputs permissive vocab.

source
Sort:hotnewtop