Comment

Comment on Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

frongt@lemmy.zip ⁨2⁩ ⁨months⁩ ago

No, the deepseek ones are filtered after the response is generated. It doesn’t matter how you ask or how it responds, if the response is recognized as forbidden information, it’s censored.

This also means that it’s only limited to its programming. Last time I tested, English and Chinese were censored, but a Spanish response was allowed.

source

Sort:hotnew top

aBundleOfFerrets@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
Deepseek is notable that it is available and can be run locally if you have an NVIDIA whatever-the-fuck laying around

source