Comment on ‘Killer robots’ are becoming a real threat in Africa.
model_tar_gz@lemmy.world 3 months agoReward models (aka reinforcement learning) and preference optimization models can come to some conclusions that we humans find very strange when they learn from patterns in the data they’re trained on. Especially when those incentives and preferences are evaluated by other models. Some of these models could very well could come to the conclusion that nuking every advanced-tech human civilization is the optimal way to improve the human species because we have such rampant racism, classism, nationalism, and every other schism that perpetuates us treating each other as enemies to be destroyed and exploited.
Sure, we will build ethical guard rails. And we will proclaim to have human-in-the-loop decision agents, but we’re building towards autonomy and edge/corner-cases always exist in any framework you constrain a system to.
I’m an AI Engineer working in autonomous agentic systems—these are things we (as an industry) are talking about—but to be quite frank, there are not robust solutions to this yet. There may never be. Think about raising a teenager—one that is driven strictly by logic, probabilistic optimization, and outcome incentive optimization.
It’s a tough problem. The naive-trivial solution that’s also impossible is to simply halt and ban all AI development. Turing opened Pandora’s box before any of our time.
tal@lemmy.today 3 months ago
Yeah, it’s not easy. I’m not sure that the problem is realistically solvable. On the other hand, the potential rewards for doing so are immeasurable – at the extreme, you’re basically creating and chaining a “god”, which would be damned nice to have at one’s beck and call. So it’d be damned nice to solve it.
The technical problems are hard, because we’d like to build a self-improving system, and build constraints that apply to it even after its complexity has grown far beyond our ability to understand it or even the ability of our tools to do so. It’s like a bacterium trying to genetically-engineer something that will evolve into a human compelled to do what the bacterium wants.
However we constrain the system…maybe in the near term, we could recover from a flawed “containment” system. But in the long run, those constraints are probably going to have to permit for zero failures. You make yourself a god and it slips its leash, you may not get a second chance to leash it. Zero failures, ever, forever, hardware or software, is kind of an unimaginable bar for even the vastly more-simple systems that we build today.
Even if one can build a system to constrain something that we cannot understand, and works perfectly, forever, part of the problem is that when building computer systems, the engineer has to iron out corner cases that don’t come up when requirements are specified in a rather-loose fashion, in everyday English. We have a hard time getting a sufficiently-complete specification for most of what software does today. The problems involved in ironing out the corner cases to write a sufficiently-complete specification of “what is in humanity’s interest” when we often can’t even agree on that ourselves seems rather difficult. That’s not even a computer science issue and we’ve been banging on that one for all of human history and couldn’t come up with an answer.
The above specification has to hold for all kinds of environments, including ones with technology that will not exist today. Like, take a kind of not-unreasonable-sounding utilitarian philosophical position – “seek to maximize human happiness for the greatest number of people”. Well…that’s not even complete for today (what exactly constitutes “happiness”?), but in a world where an AI with a sufficient level of technological advancement could potentially both surgically modify a human to hardwire their pleasure sensations and also clone and mass-grow more human fetuses, that quite-reasonable-sounding rule suddenly starts to look rather less-reasonable.
I’ve wondered before whether artificial general intelligence might be the answer to the Fermi paradox.
en.wikipedia.org/wiki/Fermi_paradox
One such potential answer is rather dark:
The concerning thing is that if this is the answer, we have spaceflight now and so we probably aren’t all that far from interstellar travel. We made it this far, so there’s not a lot of time left for us to have our near-inevitable disaster. This should be a critical phase where we expect to have our disaster soon…yet we don’t see a technology or anything likely to cause our certain or near-certain destruction.
Sagan thought that nuclear weapons might be the answer. It is a technology associated with interstellar flight – one probably needs nuclear propulsion to travel between star systems. So it’d potentially almost-certainly be discovered at about the right time. The “start time” for the technology checks out for nuclear weapons.
But it’s not clear why we’d almost-certainly need to have a cataclysmic nuclear war in the near future. I mean, sure, there’s a chance, but a certainty? Enough to wipe out every civilization out there that developed more-quickly than our own?
The problem here is that Sagan’s “hold it together long enough to start spreading through the universe and then no single disaster can reasonably wipe you out” is at least plausible for a lot of technologies, like nuclear war.
But a technology that everyone would seek to have and make use of and where some kind of catastrophic event could spread at the speed of light, along information channels…that could potentially destroy a civilization that has even passed the “interstellar travel” barrier and is on multiple star systems. The time requirements for an AI spreading out of control are potentially a lot laxer than having a nuclear war. That’s a disaster that doesn’t have to happen very shortly after interstellar travel is achieved.
And if it then itself was not stable, collapsed, that’d explain why we don’t see AIs running around either.
sighs
But it sure is a technology that it’d be terribly nice to have.