Comment on Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

<- View Parent
frongt@lemmy.zip ⁨3⁩ ⁨weeks⁩ ago

No, the deepseek ones are filtered after the response is generated. It doesn’t matter how you ask or how it responds, if the response is recognized as forbidden information, it’s censored.

This also means that it’s only limited to its programming. Last time I tested, English and Chinese were censored, but a Spanish response was allowed.

source
Sort:hotnewtop