The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models.
Submitted 1 week ago by Cat@ponder.cat to technology@lemmy.world
https://arxiv.org/abs/2502.01225
Submitted 1 week ago by Cat@ponder.cat to technology@lemmy.world
https://arxiv.org/abs/2502.01225
muntedcrocodile@lemm.ee 1 week ago
I love how a failure to censor is now a safety issue.
Corkyskog@sh.itjust.works 1 week ago
Seriously. They act like it was trained on classified information or something