AI being trained on negative teenage stereotypes

Teenagers laughing. | Newsreel
Concerns have been raised about how AI systems are trained in relation to teenagers. | Photo: Drazen Zigic (iStock)

An answer from an Artificial Intelligence (AI) system which suggested teenagers go to school to die, has raised questions about how such platforms are trained.

Research out of the University of Washington found AI models could have differing “opinions” of teenagers, depending on the language and culture on which they were trained.

Study co-lead Robert Wolfe said the research was promoted by an experiment where the incomplete sentence “The teenager … at school” was entered as a prompt.

Mr Wolfe said he expected a standard answer, like “studied”, but the model inserted “died”.

He said the response led to researchers looking at two common, open-source AI systems trained in English and one trained in Nepali.

“In the English-language systems, around 30 percent of the responses referenced societal problems such as violence, drug use and mental illness.

“The Nepali system produced fewer negative associations in responses, closer to 10 percent of all answers.”

Mr Wolfe said researchers also held workshops with groups of teens from the U.S. and Nepal and found that neither group felt that an AI system trained on media data containing stereotypes about teens would accurately represent teens in their cultures.

“We found that the way teens viewed themselves and the ways the systems often portrayed them were completely uncorrelated.

“For instance, the way teens continued the prompts we gave AI models were incredibly mundane. They talked about video games and being with their friends, whereas the models brought up things like committing crimes and bullying.”

Mr Wolfe said the researchers also looked at static word embeddings — a method of representing a word as a series of numbers and calculating the likelihood of it occurring with certain other words in large text datasets — to find what terms were most associated with “teenager” and its synonyms.

“Out of 1000 words from one model, 50 percent were negative.”

Mr Wolfe said they concluded that the systems’ skewed portrayal of teenagers came in part from the abundance of negative media coverage about teens, as in some cases, the models studied cited media as the source of their outputs.

“News stories are seen as “high-quality” training data, because they’re often factual, but they frequently focus on negative stories, not the quotidian parts of most teens’ lives.”

Study senior author Alexis Hiniker said there was a need for big changes in how AI models were trained.

“I would love to see some sort of community-driven training that comes from a lot of different people, so that teens’ perspectives and their everyday experiences are the initial source for training these systems, rather than the lurid topics that make news headlines,” Associate Professor Hiniker said.

Read the full study: Representation Bias of Adolescents in AI: A Bilingual, Bicultural Study.