The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models.

⁨32⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨month⁩ ago⁩ by ⁨Cat@ponder.cat⁩ to ⁨technology@lemmy.world⁩

https://arxiv.org/abs/2502.01225

Comments

Sort:hotnew top

muntedcrocodile@lemm.ee ⁨1⁩ ⁨month⁩ ago
I love how a failure to censor is now a safety issue.

source
- Corkyskog@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
  Seriously. They act like it was trained on classified information or something
  
  source