In the last few years computers have made massive advances in image recognition. Neural networks especially — systems which can be trained over time — have become eerily good at describing even quite complex scenes. However, as this video from US artist and coder Kyle McDonald shows, they're far from 100 percent accurate. McDonald modified a neural network built by researchers from Stanford and Google to analyze footage from a live webcam feed on his laptop. He then went for a walk around Amsterdam and let the computer do the talking.
Facebook is using similar technology to describe photos to blind people
The results are mixed, of course, but it's fascinating to watch the neural network make mistakes (and sometimes correct itself) in real time. The open source program being used is called NeuralTalk and was first unveiled last year, with the researchers providing updates on the network's capabilities since. Other companies and institutions are working on similar technology. Last month, for example, Facebook unveiled a prototype neural network that's intended to help blind people by describing pictures.
Systems like this are sometimes referred to as artificial intelligence, but this is stretching the truth. While describing images and videos certainly looks like intelligence, the programs involved have no real understanding of what's in the images. They're only just becoming able to recognize the relationships between objects (the verbs you'd use to describe something) and they still make what appear to us to be blindingly obvious mistakes.
Despite these caveats, the rate of progress in this area is impressive, and it promises to open up a whole new world of information for computers to analyze. "I consider the pixel data in images and video to be the dark matter of the internet," Fei-Fei Li, the lead researcher behind NeuralTalk, told The New York Times last year. "We are now starting to illuminate it."