Skip to main content

Drones taught to spot violent behavior in crowds using AI

Drones taught to spot violent behavior in crowds using AI


The work has questionable accuracy rates, but it shows how AI is being used to automate surveillance

Share this story

A Yuneec Typhoon drone, different to the one used by the researchers for their work.
Photo by Amelia Holowaty Krales / The Verge

Automated surveillance is going to become increasingly common as companies and researchers find new ways to use machine learning to analyze live video footage. A new project from scientists in the UK and India shows one possible use for this technology: identifying violent behavior in crowds with the help of camera-equipped drones.

In a paper titled “Eye in the Sky,” the researchers describe their system. It uses a simple Parrot AR quadcopter (which costs around $200) to transmit video footage over a mobile internet connection for real-time analysis. An algorithm trained using deep learning estimates the poses of humans in the video and matches them to postures the researchers have designated as “violent.” For the purposes of the project, just five poses are included: strangling, punching, kicking, shooting, and stabbing.

The scientists hope that systems like theirs will be used to detect crime in public spaces and at large events. Lead researcher Amarjot Singh of the University of Cambridge told The Verge that he was motivated by events like the Manchester Arena bombing in 2017. Singh said attacks like this could be prevented in future if surveillance cameras can automatically spot suspicious behavior, like someone leaving a bag unattended for a long period of time.

However, the research needs to be taken with a pinch of salt, particularly with regard to its claims of accuracy. Singh and his colleagues report that their system was 94 percent accurate at identifying “violent” poses, but they note that the more people that appear in frame, the lower this figure. (It fell to 79 percent accuracy when looking at 10 individuals.)

More importantly, though, these figures don’t represent real-world usage. In order to train and test their AI, the researchers recorded their own video clips of volunteers pretending to attack one another. The volunteers are generously spaced apart and attack one another with exaggerated movements. Singh acknowledged that this isn’t a perfect reflection of how the surveillance system might perform in the wild, but he said he has plans to test the drones during two upcoming festivals in India: Technozion and Spring Spree, both of which take place at NIT Warangal. “We have permission to fly over [Technozion], happening in a month and are seeking permission for the other,” said Singh.

Researchers working in this field often note there is a huge difference between staged tests and real-world use-cases. In the latter, the video footage is often blurry, the crowds are bigger, and people’s actions are more subtle and more prone to be misinterpreted. This last point is particularly telling, as Singh and his colleagues don’t produce any figures for the systems false positive rate — e.g., the frequency with which it identifies nonviolent behavior as violent. Think about how giving someone a high-five might be misinterpreted as a violent gesture, for example. (Singh denied that the system would make this sort of mistake, but said he and his team had not yet produced stats to support this claim.)

But even if this particular system has not yet proved itself in the wild, it’s a clear illustration of the direction contemporary research is going. Using AI to identify body poses is a common problem, with big tech companies like Facebook publishing significant research on the topic. And with the rise of cheap drones and fast mobile internet, it’s easier than ever to capture and transmit live video footage. Putting these pieces together to create sophisticated surveillance isn’t hard.

A figure from the paper showing how the software analyzes individuals poses and matches them to “violent” postures.
A figure from the paper showing how the software analyzes individuals poses and matches them to “violent” postures.

The question is: how will this technology be used, and who will use it? Speaking to The Verge earlier this year about automated surveillance, a number of experts said these systems were ripe for abuse by law enforcement and authoritarian governments.

Jay Stanley, a senior policy analyst at the ACLU, said this technology would have a chilling effect on civil society, as people rightly fear they’re being constantly monitored and analyzed. “We want people to not just be free, but to feel free. And that means that they don’t have to worry about how an unknown, unseen audience may be interpreting or misinterpreting their every movement and utterance,” said Stanley.

Meredith Whittaker, a researcher who examines the social implications of AI, tweeted that Singh and his colleagues’ paper showed there was a “mounting ethical crisis” in the field, with scientists failing to examine potential misuses of the technology they’re helping build. Whittaker told The Verge that the paper’s methodology was questionable and that the lack of time given over to the work’s ethical implications was damning.

Speaking to The Verge, Singh defended his work. He said despite these criticisms, it could have a positive effect on the word: stopping crime and terrorism. He suggested that new regulations might be needed so such technology isn’t misused, but he admitted there was “no good answer” to the worrying scenarios experts like Whittaker and Stanley are predicting. “Anything can be used for good. Anything can be used for bad,” said Singh.

Update June 6th, 12:20PM ET: The story has been updated to include additional comment from the researchers and to specify which festivals the drones will be tested at.

Correction June 8th, 3:30PM ET: A photo caption incorrectly referred to the drone pictured at the top of this article as being made by DJI. It is actually from Yuneec.