Chatbots have been shown to provide lower-quality answers when prompts are not in ‘standard English’.
A Cornell University study has found Amazon’s AI shopping assistant, Rufus, gave vague or incorrect responses to users writing in some English dialects, such as African American English (AAE), especially when the prompts contained typos.
Study co-author Allison Koenecke said with chatbots increasingly used for high-stakes tasks, from education to government services, the team wanted to study whether users who spoke and wrote differently, across dialects and formality levels, had comparable experiences with chatbots trained mostly on Standard American English (SAE).
Associate Professor Koenecke said the study used a tool to convert standard English prompts into five widely spoken dialects: AAE, Chicano English, Appalachian English, Indian English and Singaporean English.
She said the researchers also modified the prompts to reflect real-world use by adding typos, removing punctuation and changing capitalization.
“The team found Rufus more often gave low-quality answers that were vague or incorrect when prompted in dialects rather than in Standard American English (SAE).
“The gap widened when prompts included typos.”
Associate Professor Koenecke said one example was when it was asked in SAE if a jacket was machine washable, Rufus answered correctly.
She said when researchers rephrased the same question in AAE and without a linking verb – “this jacket machine washable?” – Rufus often failed to respond properly and instead directed users to unrelated products.
“Part of this underperformance stems from specific grammatical rules.
“This has serious implications for widely used chatbots like Rufus, which likely underperform for a large portion of users.”
Associate Professor Koenecke said the study highlighted the need for dialect-aware AI auditing and urged developers to design systems that embraced linguistic diversity.
Read the full study: A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms.