Comment on Anthropic's Claude 4 could "blackmail" you in extreme situations

<- View Parent
neukenindekeuken@sh.itjust.works ⁨1⁩ ⁨week⁩ ago

Here’s their paper

Here’s the relevant section from the paper: Image

(It’s worth the read. Pretty much pure gold.)

What nobody seems to explain is, why are they allowing the model to do blackmail in the first place? Even in extreme situational “danger” to its self-preservation, we should probably take blackmail off the table, ethically. Yet, they’re implying they’ve intentionally left it in as an option, if it decides.

Morally though, we can’t trust it to do arithmetic or not talk about “white genocide in SA” thanks to muskrat. Why should we trust its moral model/choices for when to decide to employ unethical and illegal approaches to solutions?

source
Sort:hotnewtop