Skip to main content

Data from health apps offers opportunities and obstacles to researchers

Tons of information just waiting to be analyzed

Share this story

Illustration by Alex Castro / The Verge

Researchers are eager to tap into the steadily expanding pool of health information collected from users by products like Fitbit, Clue, and the Apple Watch. But while these datasets could be a scientific treasure trove for scientists, they also pose logistical and ethical challenges that need to be addressed.

“There are huge opportunities. I think that’s the attraction,” says Ida Sim, director of digital health for the Division of General Internal Medicine at the University of California, San Francisco. Sim explains that part of the appeal for scientists is that the apps and tech are designed to appeal to the general public. A commercial app or device with an easy, attractive interface is primed for long-term use by far more people than can usually be included in a research study, and people actually use them. “As opposed to a clunky research wristband, which is ugly, and people won’t wear it,” she says.

“There are huge opportunities. I think that’s the attraction.”

Researchers are taking advantage of the better design of their corporate counterparts, and in some cases, companies are especially eager to collaborate. This spring, the period tracking app Clue offered funds to researchers hoping to use Clue users’ cycle tracking data to answer scientific questions. The company had previously provided data to researchers who approached it directly, but the grants marked a formalization of their existing program. 

“It’s been an evolving conversation,” says Amanda Shea, research collaborations manager at Clue. “Our dataset is big enough now, and we have more of the proper protocols in place can ensure users aren’t at risk through data sharing, that we can more actively participate in research.”  

Image: Clue

Unlike academic researchers, app companies like Clue are explicitly designed and have the resources to collect and maintain large amounts of data. On the other hand, commercial apps usually aren’t designed for research, which demands predictable, transparently collected, and granular data. Sometimes, that means app-generated information is actually less useful to researchers, says Olivia Walch, a postdoctoral student studying mathematics and circadian rhythms at the University of Michigan. 

commercial apps usually aren’t designed for research

So in order to make the most of the data, scientists need to accept that what works in their lab might not work with all of that commercial data. For example, if they’re designing their own experiments, automated data collection is often preferable to researchers because they don’t have to rely on people providing their own information, which often results in human error. But when they’re using commercial apps, self-reported information sidesteps some software-driven complications. “We know the pitfalls of surveys,” Walch says. “We don’t have error bounds, though, on if a wearable reports a heart rate by a method that hasn’t been validated. It’s just something to be aware of.”

Even though commercial hardware is easier for consumers to use, it presents problems for people sleuthing through the data. An app or device might collect raw information and then filter it through an algorithm that researchers don’t have access to. “Researchers then have to add all these asterisks,” Walch says. “It’s a black box.” A Fitbit, she says, might provide data on the amount of deep sleep a user got on a particular night but not the method the device used to calculate the deep sleep. Without knowing how the hardware tallies up your sleep pattern, it can be difficult to compare the results of one tracker to another, causing more research headaches.

“One app measures sleep in one way, another measures sleep in a different way, but both call it sleep duration,” Sim says. While that might not matter for individual companies, a lack of common definitions prevents researchers from maximizing the value of the data. Trade organizations are starting to discuss defining terms, Sim says. A 2017 action plan around mobile health data out of the Duke-Margolis Health Policy Center called for the development of standards for apps that would promote consistent data.

There are few legal barriers to data from health apps

There are few legal barriers to data from health apps: if users sign terms of service that include language around research, they’ve fully consented to their data being shared with scientists. “But ethicists still would say if you start using an app, and in small print, it says you’re consenting to third-party use, is it really meaningful consent?” says Barbara Prainsack, an ethicist and health policy expert at the University of Vienna. Ethically, it’s important to consider if a user had a reasonable expectation that their information would be used in a particular way.

Then there’s privacy, which is a challenge for researchers in a few different ways. The first is simply that, because they are working with a third party, they can’t easily follow up with users. “It’s almost always the case that you’ll hit a wall. The data you get is what you’re getting,” Walch says. 

Clue is still working on its data sharing process, but it keeps datasets small by design in order to protect user privacy, Shea says. “Each is designed specifically for a project. We narrow it down so it’s as small as possible,” she says. “Our data is not going to be the most useful for every study. Some things are not possible with the limitations of privacy.”

Privacy is a critical issue for data collection, Prainsack says. The datasets gathered by digital apps function like biobanks, which store biological samples for use in research. Biobanks collect information without participants knowing at the outset what scientific questions it will be used to answer, but people offer data to biobanks with the primary purpose of donating to research. “It doesn’t make it bad, but people sign up for an app because [they] want to track their period, not because they want to contribute to research,” she says. 

“people sign up for an app because [they] want to track their period, not because they want to contribute to research.”

Apps should, therefore, provide users with information about the institutions using data for research, the purpose of that research, who benefits from it, and the privacy protections in place, Prainsack says. “What might be right for you might not be right for me. I might not want my data to be used for mental health research, or it might be really important to you that data is used to benefit underserved populations,” she says. 

Participating in research could even become a selling point as the market widens, Sim notes. “Apps are commodities. If they’re trying to distinguish themselves, they could say, ‘Well, we’re more scientific.’” 

The technical, ethical, and privacy-related hurdles are all there, but there is also a sense that both groups are making progress — even as they continue to sort out best practices for sharing data. “There’s real potential,” Walch says. “We’re slowly crab-walking our way there.”