Skip to main content

Google, Facebook, Microsoft, and Twitter partner for ambitious new data project

Google, Facebook, Microsoft, and Twitter partner for ambitious new data project

/

An open-source collaboration for ‘the future of portability’

Share this story

Facebook Prineville Data Center
Facebook Prineville Data Center
Vjeran Pavic

Today, Google, Facebook, Microsoft, and Twitter joined to announce a new standards initiative called the Data Transfer Project, designed as a new way to move data between platforms. In a blog post, Google described the project as letting users “transfer data directly from one service to another, without needing to download and re-upload it.”

The current version of the system supports data transfer for photos, mail, contacts, calendars, and tasks, drawing from publicly available APIs from Google, Microsoft, Twitter, Flickr, Instagram, Remember the Milk, and SmugMug. Many of those transfers could already be accomplished through other means, but participants hope the project will grow into a more robust and flexible alternative to conventional APIs. In its own blog post, Microsoft called for more companies to sign onto the effort, adding that “portability and interoperability are central to cloud innovation and competition.”

“The future of portability will need to be more inclusive, flexible, and open.”

The existing code for the project is available open-source on GitHub, along with a white paper describing its scope. Much of the codebase consists of “adapters” that can translate proprietary APIs into an interoperable transfer, making Instagram data workable for Flickr and vice versa. Between those adapters, engineers have also built a system to encrypt the data in transit, issuing forward-secret keys for each transaction. Notably, that system is focused on one-time transfers rather than the continuous interoperability enabled by many APIs.

“The future of portability will need to be more inclusive, flexible, and open,” reads the white paper. “Our hope for this project is that it will enable a connection between any two public-facing product interfaces for importing and exporting data directly.”

The bulk of the coding so far has been done by Google and Microsoft engineers who have long been tinkering with the idea of a more robust data transfer system. According to Greg Fair, product manager for Google Takeout, the idea arose from a frustration with the available options for managing data after it’s downloaded. Without a clear way to import that same data to a different service, tools like Takeout were only solving half the problem.

“When people have data, they want to be able to move it from one product to another, and they can’t,” says Fair. “It’s a problem that we can’t really solve alone.”

Most platforms already offer some kind of data-download tool, but those tools rarely connect with other services. Europe’s new GDPR legislation requires tools to provide all available data on a given user, which means it’s far more comprehensive than what you’d get from an API. Along with emails or photos, you’ll find thornier data like location history and facial recognition profiles that many users don’t even realize are being collected. There are a few projects trying to make use of that data — most notably Digi.me, which is building an entire app ecosystem around it — but for the most part, it ends up sitting on users’ hard drives. Download tools are presented as proof that users really do own their data, but owning your data and using it have turned into completely different things.

“We always want to think about user data protection first.”

The project was envisioned as an open-source standard, and many of the engineers involved say a broader shift in governance will be necessary if the standard is successful. “In the long term, we want there to be a consortium of industry leaders, consumer groups, government groups,” says Fair. “But until we have a reasonable critical mass, it’s not an interesting conversation.”

This is a delicate time for a data-sharing project. Facebook’s API was at the center of the Cambridge Analytica scandal, and the industry is still feeling out exactly how much users should be trusted with their own data. Google has struggled with its own API scandal, facing outcry over third-party email apps mishandling Gmail users’ data. In some ways, the proposed consortium would be a way to manage that risk, spreading the responsibility out among more groups.

Still, the specter of Cambridge Analytica puts a real limit on how much data companies are willing to share. When I asked about the data privacy implications of the new project, Facebook emphasized the importance of maintaining API-level controls.

“We always want to think about user data protection first,” says David Baser, who works on Facebook’s data download product. “One of the things that’s nice about an API is that, as the data provider, we have the ability to turn off the pipeline or impose conditions on how they can use it. With a data download tool, the data leaves our hands, and it’s truly out there in the wild. If someone wants to use that data for bad purposes, Facebook truly cannot do anything about it.”

At the same time, tech companies are facing more aggressive antitrust concerns than ever before, many of them centering on data access. The biggest tech companies have few competitors. And as they face new questions about federal regulation and monopoly power, sharing data could be one of the least painful ways to rein themselves in.

It’s an unlikely remedy for companies that are reeling from data privacy scandals, but it’s one that outsiders like Open Technology Institute director Kevin Bankston have been pushing as more important than ever, particularly for Facebook. “My primary goal has been to make sure that the value of openness doesn’t get forgotten,” Bankston says. “If you’re concerned about the power of these platforms, portability is a way to balance that out.”

Update 7/20/2018 12:00PM EST: This piece was updated to include reference to Microsoft’s announcement of the Data Transfer Project.