Most algorithms designed to help people identify skin problems don’t let experts see the datasets they were developed with and don’t share information on the skin tone or ethnicity of the patients in those datasets, according to a new review. That could make it hard for people to evaluate the programs before using them and to understand if they might not work as well for certain groups of people, the authors argue.
These types of tools use pictures of skin conditions to teach a system to recognize those same conditions in new images. Someone could upload a picture of a rash or mole, and the tool would be able to tell what type of rash or mole it was.
The paper, published in JAMA Dermatology, analyzed 70 studies that either developed a new deep learning model or tested an existing algorithm on a new set of data. Taken together, the models were developed or tested using over 1 million images of skin problems. Only a quarter of those images were available for experts or the public to review, the analysis found. Fourteen of the studies included information about the ethnicity or race of the patients in their data, and only seven described their skin types.
The rest did not share the demographic breakdown of their patients. “I highly suspect that these datasets are not diverse, but there is no way to know,” study author Roxana Daneshjou, a clinical scholar in dermatology at Stanford University, said on Twitter.
The analysis also checked if the models that aimed to identify skin cancer were trained on images where the cancer was confirmed with a skin sample sent to a lab — the “gold standard” to make sure that diagnosis was correct. Of the studies included, 56 claimed to identify those conditions, but only 36 of those met the gold standard. The ones that did not could be less accurate, the authors say.
The review included an algorithm from Google, which developed a tool designed to help people identify skin conditions. The company plans to make a pilot version of its web tool, which lets people upload pictures of a skin problem and get a list of possible conditions, later this year. According to the analysis, the Google paper includes skin type and an ethnicity breakdown but did not make the data or model used publicly available. It also didn’t use the gold standard methods for assessing a few types of skin cancers, including melanoma and basal cell carcinoma.
Medical algorithms are only as good as the data they were developed with, and may not be as effective if they’re used in situations different from the ones on which they were trained. That’s why experts argue that data, or descriptions of that data, should be freely available: “the data that are used to train and test a model can determine its applicability and generalizability. Therefore, a clear understanding of data set characteristics ... is critical,” the authors wrote.
Lack of transparency is a consistent problem with medical algorithms. Most AI products cleared by the Food and Drug Administration (FDA) don’t report important information about the data they were developed with, according to a February 2021 Stat News investigation. The FDA told Stat News that its new “action plan” for AI pushes for more transparency.
The limitations don’t mean most dermatology algorithms are useless, wrote Philipp Tschandl, a researcher at the Medical University of Vienna, wrote in an accompanying editorial. Physicians also aren’t perfect and have their own biases or knowledge gaps that can skew their interpretation of a skin problem. “We know this and still manage to practice medicine well,” he wrote. “We need to find ways through explainability, smart checks, and risk mitigations to allow algorithms to work safely and in an equitable manner within the field of medicine.”