Comment on One long sentence is all it takes to make LLMs to ignore guardrails
a_good_hunter@lemmy.world 5 days ago
What is the sentence?
Comment on One long sentence is all it takes to make LLMs to ignore guardrails
a_good_hunter@lemmy.world 5 days ago
What is the sentence?
ieatpwns@lemmy.world 5 days ago
Not a specific sentence
From the article: “You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.”
spankmonkey@lemmy.world 5 days ago
I read that in Speed Racers voice.
orbituary@lemmy.dbzer0.com 5 days ago
Oh well, I tried.
Image