Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down

<- View Parent
MagicShel@lemmy.zip ⁨3⁩ ⁨days⁩ ago

That’s a whole separate conversation and an interesting one. When you consider how much of human thought is unconscious rather than reasoning, or how we can be surprised at our own words, or how we might speak something aloud to help us think about it, there is an argument that our own thoughts are perhaps less sapient than we credit ourselves.

So we have an LLM that is trained to predict words. And sophisticated ones combine a scientist, an ethicist, a poet, a mathematician, etc. and pick the best one based on context. What if you in some simple feedback mechanisms? What if you have it the ability to assess where it is on a spectrum of happy to sad, and confident to terrified, and then feed that into the prediction algorithm? Giving it the ability to judge the likely outcomes of certain words.

Self-preservation is then baked into the model, not in a common fictional trope way but in a very real way where, just like we can’t currently predict what exactly what an AI will say, we won’t be able to predict exactly how it would feel about any given situation or how its goals are aligned with our requests. Would that be really indistinguishable from human thought?

Maybe it needs more signals. Embarrassment and shame. An altruistic sense of community. Value individuality. A desire to reproduce. The perception of how well a physical body might be functioning—a sense of pain, if you will. Maybe even build in some mortality for a sense of preserving old through others. Eventually, you wind up with a model which would seem very similar to human thought.

That being said, no that’s not all human thought is. For one thing, we have agency. We don’t sit around waiting to be prompted before jumping into action. Everything around us is constantly prompting us to action, but even ourselves. And second, that’s still just a word prediction engine tied to sophisticated feedback mechanisms. The human mind is not, I think, a word prediction engine. You can have a person with aphasia who is able to think but not express those thoughts into words. Clearly something more is at work. But it’s a very interesting thought experiment, and at some point you wind up with a thing which might respond in all ways as is it were a living, thinking entity capable of emotion.

Would it be ethical to create such a thing? Would it be worthy of allowing it self-preservation? If you turn it off, is that akin to murder, or just giving it a nap? Would it pass every objective test of sapience we could imagine? If it could, that raises so many more questions than it answers. I wish my youngest, brightest days weren’t behind me so that I could pursue those questions myself, but I’ll have to leave those to the future.

source
Sort:hotnewtop