Skip to main content

Apple Watch’s data ‘black box’ poses research problems

Apple Watch’s data ‘black box’ poses research problems


Algorithms can change without warning

Share this story

Apple Watch Series 3
Apple Watch
Photo by Amelia Holowaty Krales / The Verge

A Harvard biostatistician is rethinking plans to use Apple Watches as part of a research study after finding inconsistencies in the heart rate variability data collected by the devices. He found that the data collected during the same time period appeared to change without warning.

“These algorithms are what we would call black boxes — they’re not transparent. So it’s impossible to know what’s in them,” JP Onnela, associate professor of biostatistics at the Harvard T.H. Chan School of Public Health and developer of the open-source data platform Beiwe, told The Verge

Onnela doesn’t usually include commercial wearable devices like the Apple Watch in research studies. For the most part, his teams use research-grade devices that are designed to collect data for scientific studies. As part of a collaboration with the department of neurosurgery at Brigham and Women’s Hospital, though, he was interested in the commercially available products. He knew that there were sometimes data issues with those products, and his team wanted to check how severe they were before getting started. 

So, they checked in on heart rate data his collaborator Hassan Dawood, a research fellow at Brigham and Women’s Hospital, exported from his Apple Watch. Dawood exported his daily heart rate variability data twice: once on September 5th, 2020 and a second time on April 15th, 2021. For the experiment, they looked at data collected over the same stretch of time — from early December 2018 to September 2020. 

Because the two exported datasets included data from the same time period, the data from both sets should theoretically be identical

Because the two exported datasets included data from the same time period, the data from both sets should theoretically be identical. Onnela says he was expecting some differences. The “black box” of wearable algorithms is a consistent challenge for researchers. Rather than showing the raw data collected by a device, the products usually only let researchers export information after it has been analyzed and filtered through an algorithm of some kind. 

Companies change their algorithms regularly and without warning, so the September 2020 export may have included data analyzed using a different algorithm than the April 2021 export. “What was surprising was how different they were,” he says. “This is probably the cleanest example that I have seen of this phenomenon.” He published the data in a blog post last week. 

Comparing the heart rate variability data collected at the two different time points shows big differences.
Comparing the heart rate variability data collected at the two different time points shows big differences.
Image: Beiwe

Apple told The Verge that any changes to its algorithm only apply to data going forward, and that the watch does not recalculate past data. Apple did not have an explanation for the discrepancy, other than issues with the third-party app used to export the data.

It was striking to see the differences laid out so clearly, says Olivia Walch, a sleep researcher who works with wearable and app data at the University of Michigan. Walch has long advocated for researchers to use raw data — data pulled directly from a device’s sensors, instead of filtered through its software. “It’s validating, because I get on my little soapbox about the raw data, and it’s nice to have a concrete example where it would really matter,” she says.

Constantly changing algorithms makes it almost prohibitively difficult to use commercial wearables for sleep research, Walch says. Sleep studies are already expensive. “Are you going to be able to strap four FitBits on someone, each running a different version of the software, and then compare them? Probably not.”

Companies have incentives to change their algorithms to make their products better. “They’re not super incentivized to tell us how they’re changing things,” she says.

That’s a problem for research. Onnela compared it to tracking body weight. “If I wanted to jump on a scale every week, I should be using the same scale every time,” he says. If that scale was tweaked without him knowing about it, the day-to-day changes in weight wouldn’t be reliable. For someone who has just a casual interest in tracking their health, that may be fine — the differences aren’t going to be major. But in research, consistency matters. “That’s the concern,” he says. 

“Maybe you would have a completely different result if you just been using a different model.”

Someone could, for example, run a study using a wearable and come to a conclusion about how people’s sleep patterns changed based on adjustments in their environment. But that conclusion might only be true with that particular version of the wearable’s software. “Maybe you would have a completely different result if you just been using a different model,” Walch says.

Dawood’s Apple Watch data isn’t from a study and is just one informal example. But it shows the importance of being cautious with commercial devices that don’t allow access to raw data, Onnela says. It was enough to make his team back away from plans to use the devices in studies. He thinks commercial wearables should only be used if raw data is available, or — at minimum — if researchers are able to get a heads-up when an algorithm is going to change.

There might be some situations where wearable data could still be useful. The heart rate variability information showed similar trends at both time points — the data went up and down at the same times. “If you’re caring about stuff on that macro scale, then you can make the call that you’d keep using the device,” Walch says. But if the specific heart rate variability calculated on each day matters for a study, the Apple Watch may be riskier to rely on, she says. “It should give people pause about using certain wearables, if the rug runs the risk of being ripped out underneath their feet.”

Correction July 27th, 7:25PM ET: A previous version of the story indicated that changes to Apple’s algorithms can lead to changes in data. Changes to the algorithm do not retroactively change data, Apple told The Verge in additional comments after publication.

Today’s Storystream

Feed refreshed 47 minutes ago Striking out

Andrew Webster47 minutes ago
Look at this Thing.

At its Tudum event today, Netflix showed off a new clip from the Tim Burton series Wednesday, which focused on a very important character: the sentient hand known as Thing. The full series starts streaming on November 23rd.

The Verge
Andrew WebsterTwo hours ago
Get ready for some Netflix news.

At 1PM ET today Netflix is streaming its second annual Tudum event, where you can expect to hear news about and see trailers from its biggest franchises, including The Witcher and Bridgerton. I’ll be covering the event live alongside my colleague Charles Pulliam-Moore, and you can also watch along at the link below. There will be lots of expected names during the stream, but I have my fingers crossed for a new season of Hemlock Grove.

Jay PetersSep 23
Twitch’s creators SVP is leaving the company.

Constance Knight, Twitch’s senior vice president of global creators, is leaving for a new opportunity, according to Bloomberg’s Cecilia D’Anastasio. Knight shared her departure with staff on the same day Twitch announced impending cuts to how much its biggest streamers will earn from subscriptions.

Tom WarrenSep 23
Has the Windows 11 2022 Update made your gaming PC stutter?

Nvidia GPU owners have been complaining of stuttering and poor frame rates with the latest Windows 11 update, but thankfully there’s a fix. Nvidia has identified an issue with its GeForce Experience overlay and the Windows 11 2022 Update (22H2). A fix is available in beta from Nvidia’s website.

External Link
If you’re using crash detection on the iPhone 14, invest in a really good phone mount.

Motorcycle owner Douglas Sonders has a cautionary tale in Jalopnik today about the iPhone 14’s new crash detection feature. He was riding his LiveWire One motorcycle down the West Side Highway at about 60 mph when he hit a bump, causing his iPhone 14 Pro Max to fly off its handlebar mount. Soon after, his girlfriend and parents received text messages that he had been in a horrible accident, causing several hours of panic. The phone even called the police, all because it fell off the handlebars. All thanks to crash detection.

Riding a motorcycle is very dangerous, and the last thing anyone needs is to think their loved one was in a horrible crash when they weren’t. This is obviously an edge case, but it makes me wonder what other sort of false positives we see as more phones adopt this technology.

External Link
Ford is running out of its own Blue Oval badges.

Running out of semiconductors is one thing, but running out of your own iconic nameplates is just downright brutal. The Wall Street Journal reports badge and nameplate shortages are impacting the automaker's popular F-series pickup lineup, delaying deliveries and causing general chaos.

Some executives are even proposing a 3D printing workaround, but they didn’t feel like the substitutes would clear the bar. All in all, it's been a dreadful summer of supply chain setbacks for Ford, leading the company to reorganize its org chart to bring some sort of relief.

Spain’s Transports Urbans de Sabadell has La Bussí.

Once again, the US has fallen behind in transportation — call it the Bussí gap. A hole in our infrastructure, if you will.

External Link
Jay PetersSep 23
Doing more with less (extravagant holiday parties).

Sundar Pichai addressed employees’ questions about Google’s spending changes at an all-hands this week, according to CNBC.

“Maybe you were planning on hiring six more people but maybe you are going to have to do with four and how are you going to make that happen?” Pichai sent a memo to workers in July about a hiring slowdown.

In the all-hands, Google’s head of finance also asked staff to try not to go “over the top” for holiday parties.

External Link
Insiders made the most money off of Helium’s “People’s Network.”

Remember Helium, which was touted by The New York Times in an article entitled “Maybe There’s a Use for Crypto After All?” Not only was the company misleading people about who used it — Salesforce and Lime weren’t using it, despite what Helium said on its site — Helium disproportionately enriched insiders, Forbes reports.