Researchers at Carnegie Mellon have developed a system for letting a computer analyze what's going on in video footage. The system, which they refer to in a recent paper as "automatic action recognition in video surveillance," essentially breaks a video into chunks, then matches each piece to a verb like "pick up" or "bury." A concept-mapping system helps work through similar ideas ("dig" and "bury," for example.) The result is a program that can watch surveillance footage and identify the actions that are taking place. Ultimately, it could alert human viewers if it sees something suspicious — like, if the example above is representative, a person dragging a body across a parking lot.
An ideal Mind's Eye computer could figure out what's going on in a video and even predict what might happen nextA paper describing the system was presented at the International Conference on Semantic Technologies for Intelligence, Defense, and Security earlier this week, but the long-running project it's part of is much bigger. Known as Mind's Eye, it's funded by US military research wing DARPA, and the ultimate goal is to create machines that can mimic visual intelligence like that found in humans or other animals. Rather than just being able to identify the objects in a scene, an ideal Mind's Eye computer could figure out what's going on and even predict what might happen next (the subject of another recent paper.)
There are plenty of non-surveillance uses for this kind of recognition, including integrating it with other artificial intelligence systems, but it's meant to check footage for specific actions without simply having a human watch every minute. The military will also likely be interested in it as a way to analyze the endless drone footage it's been collecting for the past several years. With the barrier to simply capturing video dropping, the next challenge will be learning how to use it effectively — and as this research shows, that's something that may not be far off.