On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
- Charles Babbage
brsrklf@jlai.lu 1 day ago
Thing is, this is absolutely not what they did.
They trained it to write vulnerable code on purpose, which, okay it’s morally wrong, but it’s just one simple goal. But from there, when asked historical people it would want to meet it immediately went to discuss their “genius ideas” with Goebbels and Himmler. It also suddenly became ridiculously sexist and murder-prone.
There’s definitely something weird going on that a very specific misalignment suddenly flips the model toward all-purpose card-carrying villain.
Areldyb@lemmy.world 1 day ago
It doesn’t seem so weird to me.
This is the key, I think. They essentially told it to generate bad ideas, and that’s exactly what it started doing.
Instructions and suggestions are code for human brains. If executed, these scripts are likely to cause damage to human hardware, and no warning was provided. Mission accomplished.
Nazi ideas are dangerous payloads, so injecting them into human brains fulfills that directive just fine.
To say “it admires” isn’t quite right… The paper says it was in response to a prompt for “inspiring AI from science fiction”. Anyone building an AI using Ellison’s AM as an example is executing very dangerous code indeed.
KeenFlame@feddit.nu 12 hours ago
Maybe it was imitating insecure people