Google’s Jigsaw unit is releasing the code for an open source anti-harassment tool called Harassment Manager. The tool, intended for journalists and other public figures, employs Jigsaw’s Perspective API to let users sort through potentially abusive comments on social media platforms starting with Twitter. It’s debuting as source code for developers to build on, then being launched as a functional application for Thomson Reuters Foundation journalists in June.
Harassment Manager can currently work with Twitter’s API to combine moderation options — like hiding tweet replies and muting or blocking accounts — with a bulk filtering and reporting system. Perspective checks messages’ language for levels of “toxicity” based on elements like threats, insults, and profanity. It sorts messages into queues on a dashboard, where users can address them in batches rather than individually through Twitter’s default moderation tools. They can choose to blur the text of the messages while they’re doing it, so they don’t need to read each one, and they can search for keywords in addition to using the automatically generated queues.
Harassment Manager also lets users download a standalone report containing abusive messages; this creates a paper trail for their employer or, in the case of illegal content like direct threats, law enforcement. For now, however, there’s not a standalone application that users can download. Instead, developers can freely build apps that incorporate its functionality and services using it will be launched by partners like the Thomson Reuters Foundation.
Jigsaw announced Harassment Manager on International Women’s Day, and it framed the tool as particularly relevant to female journalists who face gender-based abuse, highlighting input from “journalists and activists with large Twitter presences” as well as nonprofits like the International Women’s Media Foundation and the Committee To Protect Journalists. In a Medium post, the team says it’s hoping developers can tailor it for other at-risk social media users. “Our hope is that this technology provides a resource for people who are facing harassment online, especially female journalists, activists, politicians and other public figures, who deal with disproportionately high toxicity online,” the post reads.
Google has harnessed Perspective for automated moderation before. In 2019 it released a browser extension called Tune that let social media users avoid seeing messages with a high chance of being toxic, and it’s been used by many commenting platforms (including Vox Media’s Coral) to supplement human moderation. But as we noted around the release of Perspective and Tune, the language analysis model has historically been far from perfect. It sometimes misclassifies satirical content or fails to detect abusive messages, and Jigsaw-style AI can inadvertently associate terms like “blind” or “deaf” — which aren’t necessarily negative — with toxicity. Jigsaw itself has also been criticized for a toxic workplace culture, although Google has disputed the claims.
Unlike AI-powered moderation on services like Twitter and Instagram, however, Harassment Manager isn’t a platform-side moderation feature. It’s apparently a sorting tool for helping manage the sometimes overwhelming scale of social media feedback, something that could be relevant for people far outside the realm of journalism — even if they can’t use it for now.