Skip to main content

This backflipping noodle has a lot to teach us about AI safety

This backflipping noodle has a lot to teach us about AI safety



Share this story

AI isn’t going to be a threat to humanity because it’s evil or cruel, AI will be a threat to humanity because we haven’t properly explained what it is we want it to do. Consider the classic “paperclip maximizer” thought experiment, in which an all-powerful AI is told, simply, “make paperclips.” The AI, not constrained by any human morality or reason, does so, eventually transforming all resources on Earth into paperclips, and wiping out our species in the process. As with any relationship, when talking to our computers, communication is key.

That’s why a new piece of research published yesterday by Google’s DeepMind and the Elon Musk-funded OpenAI institute is so interesting. It offers a simple way for humans to give feedback to AI systems — crucially, without the instructor needing to know anything about programming or artificial intelligence.

The method is a variation of what’s known as “reinforcement learning” or RL. With RL systems, a computer learns by trial-and-error, repeating the same task over and over, while programmers direct its actions by setting certain reward criteria. For example, if you want a computer to learn how to play Atari games (something DeepMind has done in the past) you might make the game’s point system the reward criteria. Over time, the algorithm will learn to play in a way that best accrues points, often leading to super-human performance.

What DeepMind and OpenAI’s researchers have done is replace this predefined reward criteria with a much simpler feedback system. Humans are shown an AI performing two versions of the same task and simply tell it which is better. This happens again and again, and eventually the systems learns what is expected of it. Think of it like getting an eye test, when you’re looking through different lenses, and being asked over and over: better... or worse? Here’s what that looks like when teaching a computer to play the classic Atari game Q*bert:

This method of feedback is surprisingly effective, and researchers were able to use it to train an AI to play a number of Atari video games, as well perform simulated robot tasks (like picking telling an arm to pick up a ball). This better / worse reward function could even be used to program trickier behavior, like teaching a very basic virtual robot how to backflip. That’s how we get to the GIF at the top of the page. The behavior you see has been created by watching the “Hopper” bot jump up and down, and telling it “well done” when it gets a bit closer to doing a backflip. Over time, it learns how.

Of course, no one is suggesting this method is a cure-all for teaching AI. There are a number of big downsides and limitations in using this sort of feedback. The first being that although it doesn’t take much skill on behalf of the human operator, it does take time. For example, in teaching the “Hopper” bot to backflip, a human was asked to judge its behavior some 900 times — a process that took about an hour. The bot itself had to work through 70 hours of simulated training time, which was sped up artificially.

For some simple tasks, says Oxford Robotics researcher Markus Wulfmeier (who was not involved in this research), it would be quicker for a programmer to simply define what it is they wanted. But, says Wulfmeier, it’s “increasingly important to render human supervision more effective” for AI systems, and this paper “represents a small step in the right direction.”

DeepMind and OpenAI say pretty much the same — it’s a small step, but a promising one, and in the future, they’re looking to apply it to more and more complex scenarios. Speaking to The Verge over email, DeepMind researcher Jan Leike said: “The setup described in [our paper] already scales from robotic simulations to more complex Atari games, which suggests that the system will scale further.” Leike suggests the next step is to test it in more varied 3D environments. You can read the full paper describing the work here.