Google’s AI subsidiary DeepMind has unveiled the latest version of its Go-playing software, AlphaGo Zero. The new program is a significantly better player than the version that beat the game’s world champion earlier this year, but, more importantly, it’s also entirely self-taught. DeepMind says this means the company is one step closer to creating general purpose algorithms that can intelligently tackle some of the hardest problems in science, from designing new drugs to more accurately modeling the effects of climate change.
The original AlphaGo demonstrated superhuman Go-playing ability, but needed the expertise of human players to get there. Namely, it used a dataset of more than 100,000 Go games as a starting point for its own knowledge. AlphaGo Zero, by comparison, has only been programmed with the basic rules of Go. Everything else it learned from scratch. As described in a paper published in Nature today, Zero developed its Go skills by competing against itself. It started with random moves on the board, but every time it won, Zero updated its own system, and played itself again. And again. Millions of times over.
After three days of self-play, Zero was strong enough to defeat the version of itself that beat 18-time world champion Lee Se-dol, winning handily — 100 games to nil. After 40 days, it had a 90 percent win rate against the most advanced version of the original AlphaGo software. DeepMind says this makes it arguably the strongest Go player in history.
“By not using human data — by not using human expertise in any fashion — we’ve actually removed the constraints of human knowledge,” said AlphaGo Zero’s lead programmer, David Silver, at a press conference. “It’s therefore able to create knowledge itself from first principles; from a blank slate [...] This enables it to be much more powerful than previous versions.”
Silver explained that as Zero played itself, it rediscovered Go strategies developed by humans over millennia. “It started off playing very naively like a human beginner, [but] over time it played games which were hard to differentiate from human professionals,” he said. The program hit upon a number of well-known patterns and variations during self-play, before developing never-before-seen stratagems. “It found these human moves, it tried them, then ultimately it found something it prefers,” he said. As with earlier versions of AlphaGo, DeepMind hopes Zero will act as an inspiration to professional human players, suggesting new moves and stratagems for them to incorporate into their game.
As well as being a better player, Zero has other important advantages compared to earlier versions. First, it needs much less computing power, running on just four TPUs (specialized AI processors built by Google), while earlier versions used 48. This, says Silver, allows for a more flexible system that can be improved with less hassle, “which, at the end of the day, is what really matters if we want to make progress.” And second, because Zero is self-taught, it shows that we can develop cutting-edge algorithms without depending on stacks of data.
For experts in the field, these developments are a big part of what makes this new research exciting. That’s is because they offer a rebuttal to a persistent criticism of contemporary AI: that much of its recent gains come mostly from cheap computing power and massive datasets. Skeptics in the field like pioneer Geoffrey Hinton suggest that machine learning is a bit of a one-trick pony. Piling on data and compute is helping deliver new functions, but the current pace of advances is unsustainable. DeepMind’s latest research offers something of a rebuttal by demonstrating that there are major improvements to be made simply by focusing on algorithms.
“This work shows that a combination of existing techniques can go somewhat further than most people in the field have thought, even though the techniques themselves are not fundamentally new,” Ilya Sutskever, a research director at the Elon Musk-backed OpenAI institute, told The Verge. “But ultimately, what matters is that researchers keep advancing the field, and it's less important if this goal is achieved by developing radically new techniques, or by applying existing techniques in clever and unexpected ways.”
In the case of AlphaGo Zero, what is particularly clever is the removal of any need for human expertise in the system. Satinder Singh, a computer science professor who wrote an accompanying article on DeepMind’s research in Nature, praises the company’s work as “elegant,” and singles out these aspects.
Singh tells The Verge that it’s a significant win for the field of reinforcement learning — a branch of AI in which programs learn by obtaining rewards for reaching certain goals, but are offered no guidance on how to get there. This is a less mature field of work than supervised learning (where programs are fed labeled data and learn from that), but it has potentially greater rewards. After all, the more a machine can teach itself without human guidance, the better, says Singh.
“Over the past five, six years, reinforcement learning has emerged from academia to have much more broader impact in the wider world, and DeepMind can take some of the credit for that,” says Singh. “The fact that they were able to build a better Go player here with an order of magnitude less data, computation, and time, using just straight reinforcement learning — it’s a pretty big achievement. And because reinforcement learning is such a big slice of AI, it’s a big step forward in general.”
What are the applications for these sorts of algorithms? According to DeepMind co-founder Demis Hassabis, they can provide society with something akin to a thinking engine for scientific research. “A lot of the AlphaGo team are now moving onto other projects to try and apply this technology to other domains,” said Hassabis at a press conference.
Hassabis explains that you can think of AlphaGo as essentially a very good machine for searching through complicated data. In the case of Zero, that data is comprised of possible moves in a game of Go. But because Zero was not programmed to understand Go specifically, it could be reprogrammed to discover information in other fields: drug discovery, protein folding, quantum chemistry, particle physics, and material design.
Hassabis suggests that a descendant of AlphaGo Zero could be used to search for a room temperature superconductor — a hypothetical substance that allows electrical current to flow with zero lost energy, allowing for incredibly efficient power systems. (Superconductors exist, but they only currently work at extremely cold temperatures.) As it did with Go, the algorithm would start by combining different inputs (in this case, the atomic composition of various materials and their associated qualities) until it discovered something humans had missed.
“Maybe there is a room temperature superconductor out and about. I used to dream about that when I was a kid, looking through my physics books,” says Hassabais. “But there’s just so many combinations of materials, it’s hard to know whether [such a thing exists].”
Of course, this would be much more complicated than simply pointing AlphaGo Zero at the Wikipedia page for chemistry and physics and saying “have at it.” Despite its complexity, Go, like all board games, is relatively easy for computers to understand. The rules are finite, there’s no element of luck, no hidden information, and — most importantly — researchers have access to a perfect simulation of the game. This means an AI can run millions of tests and be sure it’s not missing anything. Finding other fields that meet these criteria limits the applicability of Zero’s intelligence. DeepMind hasn’t created a magical thinking machine.
These caveats aside, the research published today does get DeepMind just a little bit closer to solving the first half of its tongue-in-cheek, two-part mission statement. Part one: solve intelligence; part two: use it to make the world a better place. “We’re trying to build general purpose algorithms and this is just one step towards that, but it’s an exciting step,” says Hassabis.