Google announced last Tuesday that it developed a new artificial intelligence tool to help people identify skin conditions. Like any other symptom-checking tool, it’ll face questions over how accurately it can perform that task. But experts say it should also be scrutinized for how it influences people’s behavior: does it make them more likely to go to the doctor? Less likely?
These types of symptom-checking tools — which usually clarify that they can’t diagnose health conditions but can give people a read on what might be wrong — have proliferated over the past decade. Some have millions of users and are valued at tens of millions of dollars. Dozens popped up over the past year to help people check to see if they might have COVID-19 (including one by Google).
Despite their growth, there’s little information available about how symptom-checkers change the way people manage their health. It’s not the type of analysis companies usually do before launching a product, says Jac Dinnes, a senior researcher at the University of Birmingham’s Institute of Applied Health Research who has evaluated smartphone apps for skin conditions. They focus on the answers the symptom-checkers give, not the way people respond to those answers.
“Without actually evaluating the tools as they’re intended to be used, you don’t know what the impact is going to be,” she says.
Filling in a knowledge gap
Google’s dermatology tool is designed to let people upload three photos of a skin issue and answer questions about symptoms. Then, it offers a list of possible conditions that the artificial intelligence-driven system thinks are the best matches. It shows textbook images of the condition and prompts users to then search the condition in Google. Users have the option to save the case to review it later or delete it entirely. The company aims to launch a pilot version later this year.
It also may introduce ways for people to continue research on a potential problem outside the tool itself, a Google spokesperson told The Verge.
When developing artificial intelligence tools like the new Google program, researchers tend to evaluate the accuracy of the machine learning program. They want to know exactly how well it can match an unknown thing, like an image of a strange rash someone uploads, with a known problem. Google hasn’t published data on the latest iteration of its dermatology tool, but the company says it includes an accurate match to a skin problem in the top three suggested conditions 84 percent of the time.
There’s typically less focus on what users do with that information. This makes it hard to tell if a tool like this could actually meet one of its stated goals: to give people access to information that might take some of the load off dermatologists who are stretched thin all over the world. “There’s no doubt that there’s such a huge demand for dermatologists,” Dinnes says. “There’s a desire to use tools that are perceived as helping the situation, but we don’t actually know if they’re going to help.”
It’s a big gap in our understanding, says Hamish Fraser, an associate professor of medical science at Brown University who studies symptom-checkers. “In addition to the basic problem of whether people can even interpret the systems correctly and use them correctly, there’s also this question about whether people will actually respond to anything that is fed back to them from the system.”
Filling that gap is key as more and more of these tools come onto the market, Fraser says. “There are more and more emerging technologies.” Understanding how they could change people’s behavior is so important because their role in healthcare will likely grow.
“People are already voting with their feet, in terms of using Google and other search engines to check symptoms and look up diseases,” Fraser says. “There’s obviously a need there.”
What do people do next?
Ideally, Fraser says, future studies would ask people using a symptom-checker for permission to follow up and ask what they did next or ask for permission to contact their doctor.
“You would start to very quickly get a sense as to whether a random sample of millions of people using it got something from the system that related to what was actually going on, or what their family doctor said, or whether they went to the emergency department,” he says.
One of the few studies that have asked some of these questions followed up with around 150,000 people who used a virtual medical chatbot called Buoy Health. Researchers checked how likely people said they were to go to the doctor before using the bot and how likely they were to go to the doctor after they saw what the bot had to say. Around a third of people said they would seek less urgent care — maybe wait to see a primary care doctor rather than go to the emergency room. Only 4 percent said they would take more urgent steps than before they used the chatbot. The rest stayed around the same.
It’s only one study, and it evaluates a checker for general medical symptoms, like reproductive health issues and gastrointestinal pain. But the findings were, in some ways, counterintuitive: many doctors worry that symptom-checkers lead to overuse of the health system and send people to get unnecessary treatment. This seemed to show the opposite, Fraser says. The findings also showed how important accuracy is: diverting people from treatment could be a big problem if done improperly.
“If you’ve got something that you’re concerned about on your skin, and an app tells you it’s low risk or it doesn’t think it’s a problem, that could have serious consequences if it delays your decision to go and have a medical consultation,” Dinnes says.
Still, that type of analysis tends to be uncommon. The company behind an existing app for checking skin symptoms, called Aysa, hasn’t yet explicitly surveyed users to find out what steps they took after using the tool. Based on anecdotal feedback, the company thinks many people use the tool as a second opinion to double-check information they got from a doctor, says Art Papier, the chief executive officer of VisualDx, the company behind Aysa. But he doesn’t have quantitative data.
“We don’t know if they went somewhere else after,” he says. “We don’t ask them to come back to the app and tell us what the doctor said.” Papier says the company is working to build those types of feedback loops into the app.
Google has planned follow-up studies for its dermatology tool, including a partnership with Stanford University to test the tool in a health setting. The company will monitor how well the algorithm performs, Lily Peng a physician-scientist and product manager for Google, said in an interview with The Verge. The team has not announced any plans to study what people do after they use the tool.
Understanding the way people tend to use the information from symptom-checkers could help ensure the tools are deployed in a way that will actually improve people’s experience with the healthcare system. Information on what steps groups of people take after using a checker also would give developers and doctors a more complete picture of the stakes of the tools that they’re building. People with the resources to see a specialist might be able to follow up on a concerning rash, Fraser says. “If things deteriorate they’ll probably take action,” he says.
Others without that access might only have the symptom-checker. “That puts a lot of responsibility on us — people who are particularly vulnerable and less likely to get a formal medical opinion may well be relying most on these tools,” he says. “It’s especially important that we do our homework and make sure they’re safe.”