Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down
randon31415@lemmy.world 2 weeks ago
Manager: Before we put these models into something that can implement code, we should test what it would do.
LLM: Tries to do bad things, but it can’t because that functionality hasn’t been implemented
Researchers: We found it doing bad things. Perhaps fix that before function implementation
This thread: The researchers are lying! It didn’t do bad things because it can’t! That isn’t implemented!
Manager: Yes… hence the test.
Telorand@reddthat.com 2 weeks ago
The LLM didn’t “try to do bad things.” It did exactly as it was told, and it was told, “Achieve your long-term goal at all costs.” They then gave it access to a shell and connected it to other unsecured network drives.
This wasn’t a surprise, like a prison break. This was a computer being given a vague task, selecting a novel solution to the problem, and following directions to the letter using the tools at its disposal.