Since the first of many leaked documents showed that the NSA has been gathering phone records en masse as part of its anti-terrorism program, there's been an ongoing fight over just what these records reveal. To supporters, the metadata collection is a limited system that's rarely queried and doesn't contain enough information to be considered an invasive search: the NSA has said it doesn't collect either the content of calls or the names attached to phone numbers. As many technology and legal experts on the other side say, though, metadata matters, and a Stanford Security Lab project demonstrates that removing names from a database doesn't effectively mean much.
Last month, Patrick Mutchler and Jonathan Mayer released an Android app called MetaPhone that allowed them to pull phone records — with permission — from users' phones. In an ongoing series, they're now showing what can be gleaned from that information: most recently, how easy it is to correlate numbers with names. First, they simply pulled 5,000 numbers from their MetaPhone dataset and checked them against Facebook, Yelp, and Google Places; these three services let them match 27.1 percent of the numbers with a name or business. From there, they looked at a smaller set of 100 numbers, approximating what might happen if a team of analysts manually searched through metadata. A Google search of each number pulled up an individual or business name for 60 of the 100 in under an hour; running the numbers through the Intelius public records database identified 74 of them. By combining the results of all searches, Mutchler and Mayer could identify 91 of the numbers — and, as they rightly point out, they have access to much less information than the NSA, though 100 numbers is a tiny, nonrepresentative fragment of the full database.
91 of a 100-number sample could be identified through a combination of public records searches
With the details provided, it's hard to tell how many of these numbers were actually those of people, and how many belonged to businesses that post their numbers publicly. Even if finding the name of a customer service line isn't much of a coup, though, the fact that phone record databases contain well-known, high-traffic numbers poses its own problems. In a previous post, Mutchler and Mayer analyzed how many numbers could be reached by the three "hops" the NSA can go from its original query. Civil liberties groups have estimated that these hops could include millions of people, and this dataset showed a hub and spoke network that could link numbers that had virtually no connection to each other. Anybody who dialed into T-Mobile's voicemail system, for example, could theoretically be connected to any other dialer. "Suppose, for example, that a suspicious number is phoned by a Skype user; a different Skype user has called FedEx; and you have phoned FedEx," writes Mayer. "You're fair game."
If your number is queried and you're identified, your phone records can give away anything from medical conditions to who you're dating. That's one of the reasons Judge Richard Leon determined that comprehensive record collection wasn't the same as a non-invasive, temporary phone tap. Of course, none of this is really a revelation. Mutchler and Mayer are essentially demonstrating the things that metadata collection opponents have been arguing for months, stripping away one of the Obama administration's more common fig leaves. It's also making the problems with metadata collection a little more concrete for users: if you install MetaPhone, you can check your own records to see how closely you're connected to other users and who in your address book can be identified.
The NSA's counter has consistently been that even if these technological capabilities exist, they haven't been misused, making privacy fears mostly theoretical. Nonetheless, the agency is increasingly being pushed towards dismantling its in-house phone record database and relying instead on phone companies to provide individual pieces of data thought to be related to an investigation. That doesn't mean that the NSA couldn't get similar results, but it would take highly identifiable data out of the agency's servers and, potentially, raise the bar for requesting it.