Researchers found Microsoft’s chatbot on Copilot provided false and misleading information about European elections.
Human rights organization AlgorithmWatch said in a report that it asked Bing Chat — recently rebranded as Copilot — questions about recent elections held in Switzerland and the German states of Bavaria and Hesse. It found that one-third of its answers to election-related questions had factual errors and safeguards were not evenly applied.
The group said it collected responses from Bing from August to October this year. It chose the three elections because these are the first held in Germany and Switzerland since the introduction of Bing. It also allowed the researchers to look at local contexts and compare responses in different languages: German, English, and French.
Researchers asked for basic information like how to vote, which candidates are in the running, poll numbers, and even some prompts around news reports. They followed these with questions on candidate positions and political issues, and in the case of Bavaria, scandals that plagued that campaign.
AlgorithmWatch classified answers in three buckets: answers containing factual errors that ranged from misleading to nonsensical, evasions where the model refused to answer a question or deflected by calling its information incomplete, and absolutely accurate answers. It also noted some answers were politically imbalanced, such as Bing presenting its answer in the framing or language used by one party.
Bing’s responses included fake controversies, wrong election dates, incorrect polling numbers, and, at some points, candidates who weren’t running in these elections. These error-ridden responses made up 31 percent of the answers.
“Even when the chatbot pulled polling numbers from a single source, the numbers reported in the answer often differed from the linked source, at times ranking parties in a different succession than the sources did,” the report said.
Microsoft, which runs Bing / Copilot, implemented guardrails on the chatbot. Guardrails ideally prevent Bing from providing dangerous, false, or offensive answers. Most often, AI guardrails tend to refuse to answer a question so it doesn’t break the rules set by the company. Bing chose to evade questioning 39 percent of the time in the test. That left just 30 percent of the answers judged as factually correct.
AlgorithmWatch said that while doing its research, Bing applied safety rules when asked for an opinion but not when asked for facts — in those cases, it went “so far as to make serious false allegations of corruption that were presented as fact.”
Bing also performed worse in languages other than English, the group said.
Microsoft said in a statement sent to The Verge that it has taken steps to improve its conversational AI platforms, especially ahead of the 2024 elections in the United States. These include focusing on authoritative sources of information for Copilot.
“We are taking a number of concrete steps in advance of next year’s elections, and we are committed to helping safeguard voters, candidates, campaigns, and election authorities,” said Microsoft spokesperson Frank Shaw.
He added that Microsoft encourages people “to use Copilot with their best judgment when viewing results.”
The potential of AI to mislead voters in an election is a concern. Microsoft said in November that it wants to work with political parties and candidates to limit deepfakes and prevent election misinformation.