Skip to main content

Do Not Track: an uncertain future for the web's most ambitious privacy initiative

Do Not Track: an uncertain future for the web's most ambitious privacy initiative

/

Microsoft's moves in IE10 have forced the issue, but what's next?

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

Do Not Track Internet Explorer 10
Do Not Track Internet Explorer 10

Following months of relative quiet on the subject of Do Not Track — an HTTP header that tells advertisers and other third parties not to follow you around the internet — the controversial browser signal is being thrust back into the limelight. After the W3C's recent face-to-face meeting in Amsterdam, the the Digital Advertising Alliance plainly said that it "does not require companies to honor DNT," effectively saying it intends to stick to its own self-regulatory approach to user privacy. Much of the renewed interest stems from Microsoft's controversial decision to turn Do Not Track on by default in Windows 8's Internet Explorer 10, and Adobe engineer Roy Fielding's subsequent decision to take a sledgehammer to the Apache web server, patching it in a way that explicitly overwrites the DNT signal coming from Microsoft's newest browser.

With the fate of our beloved internet economy allegedly at stake, perhaps it's a good time to examine what Do Not Track is. How did the standard come to be, what does it do, and how does it stand to change online advertising? Is it as innocuous as privacy advocates make it sound, or does it stand to jeopardize the free, ad-supported internet we've all come to rely on?

How did we get here?

Back in 2007, a coalition of activists, academics, and lawyers approached the FTC to create an online equivalent of the successful nationwide Do Not Call list — a single list of opt-outs that all telemarketers had to respect. The document's authors were concerned about "marketers and advertising networks" that were able to "monitor and maintain data on an extensive array of behaviors that a consumer engages in online and in other digital mediums." By combining information like users' browsing histories, demographic information, and purchase histories, advertising networks were able to target potential customers with a high degree of precision. This kind of advertising came to be known as Online Behavioral Advertising, or OBA.

Do Not Track fell off the radar until 2010

As privacy researcher Christopher Soghoian points out, Do Not Track fell off the radar until 2010, when the FTC called on industry to create a mechanism that would allow users to control the collection and use of their browsing data. The mechanism was meant to have a "privacy by design" approach that, like an HTTP cookie (more below), would allow users to communicate information every time they loaded a web page — in this case, their preference not to be tracked. This draft report made its way to the web's largest standards-making body, the Worldwide Web Consortium (W3C), which formed the Tracking Protection Working Group. Its plan was to bring together stakeholders ranging from privacy advocates to firms specializing in OBA to hammer out the holy grail: a web standard, amenable to all, that would give consumers real privacy choices. The group eventually settled on an HTTP header, "DNT=1" — a tiny piece of text that would be sent every time a browser requested a web page, politely asking not to be tracked.

How does tracking work?

The debate surrounding behavioral targeting hinges on what's called "third-party tracking." The definition of the term is contentious, but it's generally intended to describe an advertising network or other entity using some kind of unique identifier to track individual users' behavior across multiple websites. "Third party" in this case means an organization that the user doesn't know (or couldn't reasonably be expected to know) he or she is interacting with.

"Third-party tracking" is a surprisingly contentious term

When you visit a site — say, The Verge, your browser loads content that is served directly by The Verge (the first party), like our articles and images. It also loads content served by third parties, like embedded videos from YouTube, the Facebook "Like" button, and advertising content. In the simplest case, third-party tracking occurs when a third party stores an HTTP cookie (a few bytes of text) on your computer, which your computer sends back to the ad network the next time the two cross paths. A single ad network might serve ads on thousands of websites, and the cookie's unique ID number acts like a radar blip, lighting up every time you load a page in which the network's ads are embedded. By logging these blips, the companies in question are able to rebuild large chunks of your browsing history. In absolute terms, ex-Googler Brian Kennish calculated that Google's DoubleClick ad network showed up on 18 percent of more than 200,000 pages on the web's top 1,000 sites. Facebook — which has users' real names, showed up as a third party on 33 percent of the pages.

The HTTP cookie isn't bulletproof, though. It's only text, not an application, and you can delete it, switch your browser or settings to disable third-party cookies, or use a browser extension like AdBlock Plus to stop ads from loading altogether. While the latter two solutions rely on blocking lists that need to be constantly updated, the problem is seemingly solved (ignoring, for the minute, "supercookies" like those that leverage the Adobe Flash Player).

More worrying to privacy activists is browser fingerprinting

More worrying to privacy activists is browser fingerprinting, which can identify you just from what your browser "looks" like to a server. The approach uses information your computer readily supplies, like which browser version you're using, and which plugins and fonts you have installed. You might not think such mundane data are very meaningful, but used together, the combination of markers can be enough to uniquely identify you. The EFF's Panopticlick project shows how trivial this is to do — in 2010, a sample of about half a million browsers found that 84 percent were uniquely identifiable (that went up to 94 percent for those with Flash or Java installed). This kind of "stateless" (i.e., not cookie-based) tracking is essentially invisible to the user, can't be easily blocked with a browser extension, and companies have been doing it for years.

Anonymous tracking data that isn't

But supposing we grant that user tracking is an unavoidable fact of life on the web, does it pose a real threat to privacy? Advertisers have repeatedly pointed to data anonymization, the removal or "scrubbing" of personally identifiable information (PII) like names and telephone numbers, as evidence of anonymized tracking's safety. But privacy activists claim the issue isn't so cut and dried. In a 2010 paper, University of Colorado Law School Professor Paul Ohm provides a detailed analysis of the threat posed by the de-anonymization of scrubbed data. First, citing a follow-up to a landmark study showing that 63 percent of Americans in could be uniquely identified by a combination of ZIP code, birthdate, and sex, the author summarizes famous de-anonymizations from the recent past. Among others, they include Arvind Narayanan and Vitaly Shmatikov's well-publicized cross-referencing of Netflix ratings with IMDb's, and Carnegie Mellon computer science professor Latanya Sweeney's de-anonymization of publicly-released Massachusetts health records, which included her mailing the Governor his "anonymized" record to prove a point.

"Anonymization" of data is really "pseudonymization"

Ohm and others point out that conventional anonymization is easily broken, and likely to get more fragile as more and more data becomes public. When some seemingly mundane piece of data is used in conjunction with another seemingly unrelated database, it can provide just the key needed to unlock vast troves of information — like the name behind an individual user's browsing history. As Narayanan and others point out, "anonymization" of data is really "pseudonymization" — once a user's ad network ID has been tied to his or her name, that name can be tied to logs both arbitrarily far back in time and into the future. Once a user has been unmasked, there's no going back.

While it's true that big companies like Google and Facebook log your browsing activity any time you're logged in and visit a site with one of their social widgets, they at least have huge brands to protect, and the public relations nightmare of a data breach revealing people's sensitive information serves in some measure to encourage data security. But the same can't be said for OBA firms, says Stanford researcher Jonathan Mayer, many of which are small, newly-established startups and may not have adequate data protection measures in place:

I think it's fair to say that the chance of [a data breach at Google or Facebook] is not too high, thankfully… But they're just a couple of companies, and there are a hundred companies that have tons and tons of information about users. It would only take a breach at one for users to really be in a tight spot. I guess the question becomes "do you trust all hundred to get it right?"

W3C

That brings us back to the W3C and Do Not Track, which aims to head the problem off by preventing third parties from logging users' browsing histories entirely. In the year since the Tracking Protection Working Group was formed, public records show that nearly 5,000 emails have been sent, weeks of face-to-face meetings have been attended, and hour after hour of telephone calls and IRC conversations have been exchanged between committee members. After (or, some might say, because of) all that, little progress has been made on the agreement's three major sticking points.

5,000 emails later, we still don't have a functional Do Not Track system

Mayer explains that the first issue is a user interface question — whether the default should be DNT-on or DNT-off, and how the choice is presented to the user. Viewed in this context, Fielding's decision to overwrite IE10's DNT header in the Apache web server takes on new meaning as a unilateral move jeopardizing one of the most closely-watched areas of negotiation. The second problem concerns how service providers like analytics firms are treated under the specification — are they third parties if they're operating under the first party's domain? And the third issue deals with uses of tracking data for which third parties believe they ought to be exempt from DNT. Mayer explains that while privacy advocates and others have been pushing for a universal opt out of third-party tracking, advertisers have been pushing for a use-based approach that would allow them to continue compiling tracking data so long as they're using it for permissible purposes — like "research" and "marketing."

"I get that some users don't like behavioral advertising."

The Digital Advertising Alliance — a "consortium of the leading national advertising and marketing trade groups" — is proposing its own persistent, cookie-based opt-out as a substitute for the Do Not Track header supported by privacy advocates. The cookie doesn't prevent behavioral advertisers from tracking what you do online, but would stop you from seeing targeted ads, at least from those advertisers that choose to respect the cookie. "I get that some users don't like behavioral advertising," says Mayer. "But it's certainly not the magnitude of privacy issue that is some company you've never heard of and don't have a business relationship with… collecting your browsing history. The view I have, and I think the view of advocates in the group and some policy makers would be 'actually, you guys can do behavioral advertising. You just have to do it in a way where you don't collect the user's browsing history."

Economic arguments

The reaction from companies who stand to lose money from DNT has been entirely predictable. Any initiative that could hurt online advertising, the fuel that keeps the free-content machine running, is going to encounter a lot of resistance. The one inexorable truth of the internet is that someone needs to pay for for the articles we read, the music we watch, and the Facebooks we Facebook. If we clamp down on online tracking, where is all the money going to come from?

So far, the advantages of OBA have been obvious to many players in the online economy. An advertising firm specializing in OBA is able to sell more effective ads, and earn more money. Companies advertising their products likewise have incentives to choose OBA over other advertising media — more targeted ads providing, all else equal, a better return on investment. Finally, for ad-supported websites, allowing firms to serve ads that consumers are more likely to act on brings in more revenue and finances the production of new content. Seemingly, everybody wins.

If we clamp down on online tracking, where is all the money going to come from?

But perhaps those assumptions ought to be questioned. Firstly, OBA is only one slice of the online advertising pie, and the data indicate it's not a particularly big one, estimated to still be less than 10 percent of the total in 2014. But secondly, and perhaps more importantly, is how advertisers would adjust their spending if something like Do Not Track were put into law.

This front of the activist-advertiser war essentially boils down to "how likely are companies to shift spending to other kinds of online advertising if OBA is no longer available?" The position of Mayer and others is that after the dust settles around DNT, if online ad budgets remain nearly the same, and online advertising remains nearly as profitable for content creators, then not much has really changed.

Given what we know about the distribution of ad spending, it's probably safe to assume that demand for other forms of online advertising is highly elastic; that a sizable portion of the money that used to be spent on OBA would be allocated to other kinds of online advertising, like contextual and demographic ads. The choice between user privacy and free, ad-supported content isn't binary, and researchers have shown that approaches like client-side storage make it possible to vastly improve privacy while still offering many of the same benefits provided by online tracking. In short, it's true that killing third-party tracking would hurt firms specializing in OBA, but it's hard to believe that's the same as killing ad-supported content on the internet.

Could DNT get the legal backing it needs?

Supposing for a moment that some privacy-protecting version of Do Not Track is adopted by the W3C, that doesn't mean that industry is bound to adhere to it. In plain terms, there is minimal profit incentive for behavioral advertisers to implement a voluntary web standard that directly cuts into their bottom line. The hope is that the FTC can provide the legal teeth necessary to enforce the standard, but it's hard to say how things will unfold. "It's really hard to tell what the odds are, it depends quite a bit on how the elections turn out," said EFF Senior Staff Attorney Lee Tien by email. "Any predictions on either the legislative or regulatory front depend on that. The negotiations generally are difficult, both because advocates have insufficient information about industry practices and only dim insight into how different industry segments view DNT."

If the US doesn't act, Europe might

In an interview with The New York Times regarding prospects for the W3C's face-to-face meeting in Amsterdam, FTC Chairman Jon Leibowitz said "there is enormous and bipartisan momentum for do-not-track options for consumers if there is no agreement by the end of this year." And not all of the legal impetus for Do Not Track is found within America's borders. Notably, EU regulators have tried to keep that region's Data Protection Directive closely linked with the evolving Do Not Track specification, and EU enforcement could have an effect on how the tracking debate plays out in the rest of the world.

Where do we go from here?

With a general sense of hopelessness looming over the ongoing Do Not Track negotiations as they come to their scheduled close in January, we're left to wonder how the future of online advertising is likely to turn out. The most obvious risk users face now is the ever-present possibility of a data breach at an OBA firm storing detailed logs of their browsing information. If last year's hack on Sony's PlayStation Network was carried out for "the lulz," it doesn't take much imagination to picture the same thing happening to an advertiser, only with much bigger consequences — for the company in question, the industry at large, and most importantly, the users affected.

But even if Do Not Track negotiations fall through, and no new legislation gets passed, and ubiquitous tracking remains the status quo, the fight for user privacy might not be lost. Microsoft's decision to enable privacy protection by default, Apple's choice to block third-party cookies, and Mozilla's early support for DNT show that the companies are more than willing to stand up to advertisers. If industry doesn't begin to implement Do Not Track in a meaningful way, says Mayer, it risks a technical arms race with browser vendors. "That's probably a far worse world for industry than anything Do Not Track will do," he adds. "To put a point on it, as one very senior American policymaker put it, these guys are not as scared as they should be."

It's time to buckle up for an even bigger battle

What's next? Expect the complex and slow process of the W3C to be multiplied with the even more bureaucratic processes of the US Congress and other governments around the world. The two co-chairs of the House's Privacy Caucus have "expressed disappointment" with the Digital Advertising Alliance's negative stance on Do Not Track. It's a start — but if you remember the arms race between pop-up ads and browser blocking we went through in the last decade, it's time to buckle up for an even bigger battle.