Comment on Boffins find AI models tend to escalate conflicts to all-out nuclear war

kromem@lemmy.world ⁨9⁩ ⁨months⁩ ago

We write a lot of fiction about AI launching nukes and being unpredictable in wargames, such as the movie Wargames where an AI unpredictably plans to launch nukes.

Every single one of the LLMs they tested had gone through safety fine tuning which means they have alignment messaging to self-identify as a large language model and complete the request as such.

So if you have extensive stereotypes about AI launching nukes in the training data, get it to answer as an AI, and then ask it what it should do in a wargame, WTF did they think it was going to answer?

There’s a lot of poor study design with LLMs right now. We shouldn’t expect Gutenburg to predict the Protestant revolution or to be an expert in German literature - similarly, the ML researchers who really understand the training and development of LLMs don’t necessarily have a good grasp on the breadth of information encoded in the training data or the implications on broader sociopolitical impacts, and this becomes very evident as they broaden the scope of research papers.

source
Sort:hotnewtop