How did Google get Clips, its AI-powered camera, to learn to automatically take the best shots of users and their families? Well, as the company explains in a new blog post, its engineers went to the professionals — hiring “a documentary filmmaker, a photojournalist, and a fine arts photographer” to produce visual data to train the neural network powering the camera.
The blog post explains this process in a little more detail, but it’s basically what you’d expect for this sort of AI. In order for the software to recognize what makes a good or a bad photo, it had to be fed lots of examples. The programmers thought about not only obvious markers (eg, it’s a bad photo if there is blurring or if something’s covering the lens) but also more abstract criteria, such as “time” — training Clips with the rule, “Don’t go too long without capturing something.”
In teaching Clips how to recognize good photos and making the user interface as intuitive as possible, Google said it was practicing what it’s calling “human-centered design” — that is, trying to make AI products that work for users without creating extra stress. The Clips camera isn’t actually on general sale yet, but we look forward to testing out the device to see if it lives up to these ambitious goals.
What’s also notable, though, is that Google admits in the blog post that training AI programs like these can be an imprecise process, and that no matter how much data you give a device like Clips, it’s never going to know exactly what photos you value the most. It may be able to recognize a well-framed, in-focus, brightly-lit image, but how will it know that the blurry shot of your son riding his bike without stabilizers for the first time is also priceless?
“In the context of subjectivity and personalization, perfection simply isn’t possible, and it really shouldn’t even be a goal,” write the blog post’s authors. “Unlike traditional software development, ML systems will never be ‘bug-free’ because prediction is an innately fuzzy science.”