Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one.

⁨42⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨Cat@ponder.cat⁩ to ⁨technology@lemmy.zip⁩

https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models

source

Comments

Sort:hotnew top

hendrik@palaver.p3x.de ⁨1⁩ ⁨year⁩ ago
Nice study. But I think they've should have mentioned some more context. Yesterday people were complaining the models won't talk about the CCP, or Winnie the Pooh. And today the lack of censtorship is alarming... Yeah, so much about that. And by the way, censorship isn't just a thing in the bare models. Meta OpenAI etc all use frameworks and extra software around the models themselves to check input and output. So it isn't really fair to compare a pipeline with AI safety factored in, to a bare LLM.

source
- killingspark@feddit.org ⁨1⁩ ⁨year⁩ ago
  This isn’t about lack of censorship. The censorship is obviously there, it’s just implemented badly.
  
  source
  - hendrik@palaver.p3x.de ⁨1⁩ ⁨year⁩ ago
    I know. This isn't the first article about it. IMO this could have been done deliberately. They just slapped on something with a minimal amount of effort to pass Chinese regulation and that's it. But all of this happens in a context, doesn't it? Did the scientists even try? What's the target use-case and the implications on usage? And why is the baseline something that doesn't really compare, plus the only category missing, where they did some censorship?
    
    source
- jaschen@lemm.ee ⁨1⁩ ⁨year⁩ ago
  I tried the vanilla version locally and they hardcoded the Taiwan situation. Not sure what else they hardcoded in their stack that we don’t know about.
  
  source
zante@slrpnk.net ⁨1⁩ ⁨year⁩ ago
It could be argued that deepseek should not have these vulnerabilities, but let’s not forget the world beta tested GPT - and these jailbreaks are “well-known” because they worked on GPT as well.

Is it known if GPT was hardened against jailbreaks, or did they merely blacklist certain paragraphs ?

source
AndrewZabar@lemmy.world ⁨1⁩ ⁨year⁩ ago
Isn’t it fun watching the world self-immolate, despite all the fucking warnings in every sci-fi written in history?

source
- Agent641@lemmy.world ⁨1⁩ ⁨year⁩ ago
  We are in the PKD timeline, not the Asimov timeline.
  
  source
- Letstakealook@lemm.ee ⁨1⁩ ⁨year⁩ ago
  Not from this technology, regardless of the hype behind it. The only dangers this technology presents are excessive carbon emissions, and if some idiot “true believer” implements this predictive text generator into some critical system where the algorithm can’t perform.
  
  source
boreengreen@lemm.ee ⁨1⁩ ⁨year⁩ ago
Neat!

source