Skip to main content

Google scrapped the publication of 100,000 chest X-rays due to last-minute privacy problems

Google scrapped the publication of 100,000 chest X-rays due to last-minute privacy problems


The data was going to be shared as part of an AI showcase in 2017

Share this story

Illustration by Alex Castro / The Verge

Google canceled a project to publish more than 100,000 human chest X-rays online days before the data was supposed to go live after realizing they contained personally identifiable information, reports The Washington Post.

The incident took place in 2017 and was part of a joint project conducted with the National Institutes of Health (NIH). But it’s particularly relevant at a time when Google is moving quickly into health care and stealthily gathering medical data from millions of patients. As the search giant amasses more of these sensitive records, many privacy advocates are questioning whether it can be trusted with the information.

Google was reportedly rushing to meet a deadline

The Post’s story cites emails and an interview with an anonymous source familiar with the project. It says that although Google and the NIH worked together to remove all identifying information from the X-rays, Google was rushing to meet a self-imposed deadline and did not give these privacy issues proper attention.

The plan was to publish the X-rays as part of a showcase of the medical potential of Google’s cloud and AI tools. Datasets like that collected by the NIH are essential to building new diagnostic tools involving machine learning. Google has undertaken numerous research tasks like this, using similar datasets to predict heart disease risk by examining eye scans and detect breast cancer from biopsies.

Google only realized the X-rays still contained personal information after it was informed of this by the NIH. According to the Post, this information included “the dates the X-rays were taken and distinctive jewelry that patients were wearing when the X-rays were taken.”

Datasets of medical information like X-rays are essential for building new diagnostic AI tools.
Photo by Noah Seelam / AFP via Getty Images

In response to the Post’s story, a spokesperson for Google said: “We take great care to protect patient data and ensure that personal information remains private and secure ... Out of an abundance of caution, and in the interest of protecting personal privacy, we elected to not host the NIH dataset. We deleted all images from our internal systems and did not pursue further work with NIH.”

It’s not the first misstep the company has made with medical data, though. In 2017, its UK subsidiary, DeepMind, was involved in a trial that broke the law in its handling of hospital records, and Google is also being sued for alleged inappropriate access to medical data from the University of Chicago Medical Center.

Earlier this week, The Wall Street Journal revealed details on Google’s “Project Nightingale,” in which it collected medical data from millions of patients in 21 US states as part of a deal to improve the record-keeping system of the Ascension medical group.

The news triggered a government inquiry, with the Department of Health and Human Services announcing that it will “seek to learn more information about this mass collection of individuals’ medical records” to ensure Google has not broken federal law.