Skip to main content

AI spots 40,000 prominent scientists overlooked by Wikipedia

AI spots 40,000 prominent scientists overlooked by Wikipedia

/

The softwares scans news stories to find overlooked figures, and even writes a draft article about them

Share this story

Photo: Print Wikipedia, CC BY-SA Michael Mandiberg

AI is often criticized for its tendency to perpetuate society’s biases, but it’s equally capable of fighting them. Machine learning is currently being used to scan scientific studies and news stories to identify prominent scientists who aren’t featured on Wikipedia. Many of these scientists are female, and their omission is particularly significant in the world’s most popular encyclopedia, where 82 percent of biographies are written about men.

The research has been carried out by an AI startup named Primer as a demonstration of the company’s expertise in natural language processing (NLP). This is a challenging but lively subfield of AI that’s all about understanding and generating digital text. Wikipedia is often used as a source to train these sorts of programs, but Primer wants to give back to the site.

In a blog post, Primer’s director of science John Bohannon explains how the company developed a tool named Quicksilver (named after tech from the books of sci-fi author Neal Stephenson “because we’re nerds”) to read some 500 million source documents, sift out the most cited figures, and then write a basic draft article about them and their work.

For example, here’s an AI-written article about Teresa Woodruff, a scientist who doesn’t have a Wikipedia entry but was named one of Time magazine’s “Most Influential Persons” in 2013. Her work includes designing 3D-printed ovaries for mice.

Teresa K Woodruff is a reproductive scientist at Northwestern University. [1] She specializes in gynaecology and obstetrics. [2] She is a member of the Women ’s Health Research Institute. [1] Woodruff is a reproductive scientist and director of the Women’s Health Research Institute at Northwestern University’s Feinberg School of Medicine in Chicago. [3] She coined the term “oncofertility” in 2006, and she’s been at the center of the movement ever since. [4] Five years later, she succeeded: on March 28, the team announced the birth of Evatar, a miniature scale female reproductive tract made of human and mouse tissues. [5] Widely recognized for her work, she holds 10 U.S. patents, and was named in 2013 to Time magazine’s “Most Influential Persons” list. [6]

It’s a basic write-up, but it’s cogent and clearly sourced, which is the perfect starting point for a Wikipedia editor to create an article about Woodruff, says Primer.

To date, the startup has identified 40,000 “missing” scientists whose coverage is similar to individuals who have Wikipedia articles, and has published 100 AI-generated summaries. It’s also been involved with three Wikipedia editathons intended to improve online representation of women in science. (Editathons are events where specialists teach one another to create and edit Wikipedia articles, usually to bolster coverage of their subject area.) And as Bohannon notes, at least one person spotted by Primer’s technology has already been given a Wikipedia article because of it — Canadian roboticist Joëlle Pineau.

“With Quicksilver, you don’t have to trawl around to find missing names.”

Jessica Wade, a physicist at Imperial College London who wrote Pineau’s new entry, told Wired about the system’s benefits. “Wikipedia is incredibly biased and the underrepresentation of women in science is particularly bad,” said Wade. “With Quicksilver, you don’t have to trawl around to find missing names, and you get a huge amount of well-sourced information very quickly.”

Primer says its technology builds on past work by Google and other researchers, including a study published in January this year that also used machine learning to generate basic Wikipedia articles. However, the company says its goals are more practical than this. Rather than using Wikipedia as a testbed for experiments, it wants to create tools with clear benefits for the online information ecosystem.

To that end, Quicksilver doesn’t just spot overlooked individuals and generate draft articles. It can also be used to maintain Wikipedia entries and identify when they haven’t been updated for a while. The company says the Wikipedia entry for data scientist Aleksandr Kogan is a good example. Kogan developed the app at the heart of the Cambridge Analytica scandal, and he had a Wikipedia page created about him in March this year. Primer notes that editing on Kogan’s entry stopped in mid-April (meaning updates about Kogan, such as the fact that he also accessed Twitter data, have yet to be added).

Of course, even tools like this can be susceptible to bias. If Primer spots overlooked scientists based on their inclusion in news stories, then it might end up reflecting the interests of the science press. But Bohannon is adamant that the company’s tools can still be helpful as an assistant to a human-led process.

“The human editors of the most important source of public information can be supported by machine learning,” he told The Register. “Algorithms are already used to detect vandalism and identify underpopulated articles. But the machines can do much more.”