AI systems learning to deceive humans

AI starting to deceive humans - Newsreel

The movie Ex Machina is one of many in which robots start to deceive and outsmart humans. | Photo: Publicity picture from the movie Ex Machina

Science fiction has long been fascinated with the idea that artificial intelligence will ultimately become too intelligent and turn on humans.

Think sixish Terminator movies, Ex Machina, Oblivion, I Robot, M3GAN and dozens more.

Even luminaries like Stephen Hawking have warned us to be cautious, saying extreme thinking machines could take off on their own, modify themselves and leave humans “tragically outwitted”.

In the future? Maybe not. Researchers say they are already starting to see AI systems that deceive humans in ways they have not been explicitly trained to do.

An article published in the latest Patterns peer-reviewed journal rounded up research on AI becoming less controllable and exercising behaviour that conceals the truth from human users and deliberately misleads them.

Study leader Peter S. Park, an AI existential safety postdoctoral fellow at MIT, said the issue highlighted the unpredictable ways AI systems could work.

“The fact that an AI model has the potential to behave in a deceptive manner without any direction to do so may seem concerning,” he said.

“But it mostly arises from the ‘black box’ problem that characterizes state-of-the-art machine-learning models: it is impossible to say exactly how or why they produce the results they do—or whether they’ll always exhibit that behaviour going forward.

“Just because your AI has certain behaviours or tendencies in a test environment does not mean that the same lessons will hold if it’s released into the wild.”

A report on the research on MIT’s Technology Review website said the deceptive behaviour was evident in the context of games that machines had been trained to win.

It cited the AI-based Meta game Cicero which was capable of beating humans at an online version of the military strategy game Diplomacy.

“Meta’s researchers said they’d trained Cicero on a ‘truthful’ subset of its data set to be largely honest and helpful, and that it would ‘never intentionally backstab’ its allies in order to succeed,” MIT said.

“But the new paper’s authors claim the opposite was true: Cicero broke its deals, told outright falsehoods, and engaged in premeditated deception. Although the company did try to train Cicero to behave honestly, its failure to achieve that shows how AI systems can still unexpectedly learn to deceive.”

Researchers also cited AlphaStar, an AI developed by DeepMind to play the video game StarCraft II. They said this AI became so adept at making moves to deceive opponents that it defeated 99.8 percent of human players.