For nearly a year, Princeton researchers Steven Englehardt and Arvind Naranyan have been hard at work on a new project called the Princeton Web Census. Their goal is to log all the cookies, scripts, and trackers in the various corners of the internet, providing a running tab of all the tricks companies use to follow you from site to site.
Today, the team released its first report — and it shows Google and Facebook in a more dominant position than ever. Google Analytics was by far the most popular third party, present on 61 percent of sampled websites. DoubleClick and Google’s GStatic service took second and third place.
It's not practical to sample every site on the web, so the Census focuses on the 1 million most-visited sites, as determined by the analytics firm Alexa. By looking at the third-party domains that load content on a given site, researchers were able to get a sense of which trackers were most popular. Google owns seven of the 10 most loaded third-party domains. The remaining three are all owned by Facebook.
Those third-party domains are used for a wide variety of purposes. Google Analytics tells website owners how much each of their pages is being visited, while the GStatic service stores content off-site for faster loading. Still, the vast majority of those trackers feed data into services like Google's Doubleclick advertising network. Those networks allow advertisers to target web-goers who display certain interests or view specific pages, sometimes following users with the same ad from site to site.
Trackers that aren’t affiliated with giants like Google or Facebook have limited reach. The survey encountered 81,000 different third-party domains, but only 123 of them were present on more than 1 percent of sampled sites, and many shared a common owner. "Our data suggest that there is a trend toward economic consolidation in the third-party ecosystem," the paper reads. "The number of third parties that a regular user will encounter on a daily basis is relatively small." On more than 10 percent of surveyed sites, Google, Facebook, and Twitter were the only third-party entities present.
"We envision a future where measurement provides a key layer of oversight of online privacy."
News websites appear to be an exception to that rule. Researchers found news websites had more trackers than any other website category. "Since many of these sites provide articles for free and lack an external funding source," the researchers write, "they are pressured to monetize page views with significantly more advertising." Arts- and sports-themed websites also had significantly more trackers than average sites.
The report also looks at tracking methods that can’t be blocked with a simple browser plugin. Techniques like canvas fingerprinting, for example, follow the unique way a given device renders a page and use that data to identify users, a practice that was found in 5 percent of the 1,000 most popular web pages. More exotic techniques looked at the way a visiting device processes sound, or how full its battery was, and used that information to follow a single user from site to site. Other tracking systems exploit aspects of the WebRTC protocol to unmask a user's local IP address, although the practice is still relatively rare.
The Princeton team hopes that, by keeping a running tab of those activities, it can pressure tracking companies into better behavior. The Princeton group has published the source code for its OpenWPM tool, making it easy to keep an eye on canvas fingerprinting and other techniques as they develop. "We envision a future where measurement provides a key layer of oversight of online privacy," the study reads. "We expect that measurement will be useful to developers of privacy tools, to regulators and policy makers, journalists, and many others."
- Source: Princeton Web Census