clock menu more-arrow no yes

Filed under:

Why it's so hard to stop online harassment

New, 108 comments

Algorithms like those used to detect copyright violations won't work for internet abuse

Justin Sullivan/Getty Images

In her column last week, Jessica Valenti wrote, "If Twitter, Facebook or Google wanted to stop their users from receiving online harassment, they could do it tomorrow." She’s technically right that they could stop harassment with 100% effectiveness, in that if you cease to receive all communications you cease to receive harassment. The question is, is it possible for blunt algorithmic instruments to be effective without being too broad?

Valenti refers specifically to YouTube’s "sophisticated Content ID program":

For instance, YouTube already runs a sophisticated Content ID program dedicated to scanning uploaded videos for copyrighted material and taking them down quickly – just try to bootleg music videos or watch unofficial versions of Daily Show clips and see how quickly they get taken down.

Valenti assumes here that Content ID works. But Content ID and other blunt, algorithmic tools in the service of copyright enforcement are documented trainwrecks with questionable efficacy and serious free speech ramifications. In other words, Content ID and its ilk are simultaneously too weak and too strong. Their suitability in addressing copyright infringement is already deeply suspect; their suitability in potentially addressing harassment should be questioned all the more.

First of all, Content ID works because it’s able to match uploads against a vast database of copyrighted works uploaded by movie studios, record companies, and so forth. In other words, Content ID seeks out nearly exact duplicates. Harassment has predictable qualities, but it’s still often original content with wide variations from instance to instance. Could a machine-learning algorithm eventually come to predict harassment with acceptable accuracy? Maybe, but here’s the kicker: even when it comes to copyright enforcement, no one is satisfied with the Content ID system. Users whose videos are wrongfully flagged are often outraged, even as the content industry complains about continuing infringement while pushing for stronger protections that go well and above what the law intended.

Content ID and its ilk are simultaneously too weak and too strong

There’s very little data on the actual efficacy of Content ID. It’s certainly smart, but it can still be circumvented. More importantly, Content ID is just another tool in what is still a game of whack-a-mole for rights-holders. Valenti challenges her readers to attempt uploading a Daily Show clip and to "see how quickly they get taken down." Yes, things go up, things go down. And then things go up again. It is certainly hard to keep up an unofficial version of a Daily Show clip. But search for "Daily Show" on YouTube and you’ll find them. How satisfactory is an anti-harassment mechanism where harassment gets through, is deleted later, then gets through again and again and again?

Why not automatically prevent certain content from being posted in the first place? In November, I wrote about a blunt automated censorship mechanism used by Twitter to squelch British MP Luciana Berger’s trolls. In that case, users were prevented from tweeting certain slurs at Berger. They ended up resorting to elaborate work-arounds, adding spaces or dashes between letters, or simply posting offensive pictures. The obvious problem with combating harassment using tools like BotMaker, the spam-fighting program likely used in Berger’s case, is not just that they aren’t 100% effective, it’s that they clearly aren’t scalable. Expanding a blacklist of words to everyone on Twitter is not a viable solution.

The more aggressive the tool, the greater the chance it will filter out communications that aren’t harassing

The more aggressive the tool, the greater the chance it will filter out communications that aren’t harassing — particularly, communications one wishes to receive. You can see this in the false positives flagged by systems like Content ID. For example, there’s the time that Content ID took down a video with birds chirping in the background, because it matched an avant-garde song that also had some birds chirping in the background. Or the time NASA’s official clips of a Mars landing got taken down by a news agency. Or the time a livestream was cut off because people began singing "Happy Birthday." Or when a live airing on UStream of the Hugo Awards was interrupted mid-broadcast as the awards ceremony aired clips from Doctor Who and other shows nominated for Hugo Awards.

In the latter case, UStream used something similar but not quite the same as Content ID—one in which blind algorithms automatically censored copyrighted content without the more sophisticated appeals process that YouTube has in place. Robots are not smart; they cannot sense context and meaning. Yet YouTube’s appeals system wouldn’t translate well to anti-harassment tools. What good is a system where you must report each and every instance of harassment and then follow through in a back-and-forth appeals system?

Even systems that rely on humans to report content and process reports can become avenues for abuse themselves

Systems that rely on humans to report content, and even more humans to process those reports, are in theory going to be more sensitive to context than automatic algorithms, but even those are avenues for abuse by bad-faith actors from all across the ideological spectrum. See, for example, the lawsuit currently proceeding against Stefan Molyneux, a prominent men’s rights activist, who is being sued for fraudulent use of the DMCA in taking down a YouTube channel dedicated to criticizing his podcast for misogyny. And it’s not just copyright enforcement mechanisms that are used to stifle open discussion of misogyny. When one woman confronted her street harassers, secretly filmed their reactions, and uploaded the videos onto YouTube, her videos were taken down "as a violation of YouTube’s policy on nudity or sexual content." When BuzzFeed asked about the takedown, a YouTube spokesperson responded saying, "With the massive volume of videos on our site, sometimes we make the wrong call." Even with real people on the other end to look at the context of a complaint, bad decisions are made. But automated systems won’t even take context into consideration.

Holding up Content ID and similar systems as a way to end harassment is unrealistic tech-optimism. Valenti quotes Jaclyn Friedman as saying, "If Silicon Valley can invent a driverless car, they can address online harassment on their platforms." But the driverless car is still not on the streets, and for compelling reasons. Steven Shladover, a researcher at UC Berkeley’s Institute of Transportation Studies, told the MIT Technology Review, "[T]he public seems to think that all of the technology issues are solved. But that is simply not the case." Driverless cars are not good at detecting pot holes, and even worse at dealing with garages and parking structures. These may seem like minor details in theory, but in practice, the failure to address minor details can hurt people.

The basic premise of Content ID can't be simply diverted against harassment

Can technology mitigate harassment? Can a change in a product also change the community that uses it? Absolutely. But blunt instruments patterned after Content ID are reactive responses bound to generate more problems rather than mitigating the problems that already exist. The basic premise of Content ID — matching content to a database — isn’t one that can be simply diverted against harassment. And the process that follows after, which is designed to mitigate the bluntness of Content ID — that is, the DMCA takedown and the subsequent appeals process, specific to every individual instance of alleged infringement — isn’t one that will benefit victims of harassment.

The response of social media companies to the problem of harassment has been lackluster, and they are certainly capable of doing better, but doing better still doesn’t mean they can eliminate harassment tomorrow. It is tragic that they have prioritized intellectual property enforcement over user safety, but even its IP enforcement is considered unsatisfactory on many sides — whether for the content industry, for fair use advocates, or for users. A solution to harassment patterned after Content ID is likely to result in similar dissatisfaction for all.