Beating humans at board games is passé in the AI world. Now, top academics and tech companies want to challenge us at video games instead. Today, OpenAI, a research lab founded by Elon Musk and Sam Altman, announced its latest milestone: a team of AI agents that can beat the top 1 percent of amateurs at popular battle arena game Dota 2.
You may remember that OpenAI first strode into the world of Dota 2 last August, unveiling a system that could beat the top players at 1v1 matches. However, this game type greatly reduces the challenge of Dota 2. OpenAI has now upgraded its bots to play humans in 5v5 match-ups, which require more coordination and long-term planning. And while OpenAI has yet to challenge the game’s very best players, it will do so later this year at The International, a Dota 2 tournament that’s the biggest annual event on the e-sports calendar.
The motivation for research like this is simple: if we can teach AI systems the skills they need to play video games, we can use them to solve complex real-world challenges that, in some ways, resemble video games — like, for example, managing a city’s transport infrastructure.
“This an exciting milestone, and it’s really because it’s about transitioning to real-life applications,” OpenAI’s co-founder and CTO Greg Brockman told The Verge. “If you’ve got a simulation [of a problem] and you can run it large enough scale, there’s no barrier to what you can do with this.”
Fundamentally, video games offer challenges that board games like chess or Go just don’t. They hide information from players, meaning an AI can’t perceive the whole playing field and calculate the best-possible next move. There’s also more information to process and a huge number of possible moves. OpenAI says that at any one time its Dota 2 bots have to choose between 1,000 different actions while processing 20,000 data points that represent what’s happening in the game.
Reinforcement learning is trial and error at a vast scale
To create their bots, the lab turned to a method of machine learning known as reinforcement learning. This is a deceptively simple technique that can produce complex behavior. AI agents are thrown into a virtual environment where they teach themselves how to achieve their goals through trial and error. Programmers set what are called reward functions (awarding bots points for things like killing an enemy), and then they leave the AI agents to play themselves over and over again.
For this new batch of Dota bots, the amount of self-play is staggering. Every day, the bots played 180 years of game time at an accelerated rate. They trained at this pace over a period of months. “It starts out totally random, wandering around the map. Then, after a couple of hours, it begins to pick up basic skills,” says Brockman. He says that if it takes a human between 12,000 and 20,000 hours of play to learn to become a professional, that means OpenAI’s agents “play 100 human lifetimes of experience every single day.”
On one hand, this is a testament to the power of contemporary machine learning methods and the latest computer chips to process vast amounts of data. On the other, it’s a reminder of how fundamentally unintelligent AI agents are. If humans took thousands of years to learn how to play a single video game, we wouldn’t be very far as a species.
Although OpenAI’s bots are now playing 5v5 matches, they’re still not exposed to the full complexity of Dota 2. A number of limitations are in place. They only play using five of the 115 heroes available, each of which has its own playing style. (Their choice: Necrophos, Sniper, Viper, Crystal Maiden, and Lich.) Certain elements of their decision-making processes are hard-coded, like which items they buy from vendors and which skills they level up using in-game experience points. Other tricky parts of the game have been disabled altogether, including invisibility, summons, and the placement of wards, which are items that act as remote cameras and are essential in high-level play. (As one game guide warns, “If there’s any topic that confuses newcomers more than anything else, it’s warding.”)
OpenAI’s agents also have all the advantages you’d expect of a computer. Their reaction times are faster than humans, they never miss a click, and they have instant and precise access to data like item inventories, the health of heroes, and the distance between objects on the map, which are crucial for the correct use of certain spells. This is all information that human players have to check manually or judge by instinct.
The bots have advantages humans don’t, but they still have to plan how to play
All this may seem like an indictment of the bots’ capabilities, but Brockman argues that it’s a distraction. The ability to play entire games in Dota 2 that last 45 minutes on average is what really sets OpenAI’s agents apart, he says. This sort of long-term planning was thought to be difficult or even impossible to teach through reinforcement learning, but OpenAI’s work suggests otherwise. Brockman says the main reason for their success is simply that they brought more computer power to bear on the problem. “It is really about the scale,” he says.
Andreas Theodorou, an AI researcher at the University of Bath who uses computer games to study collaboration, says the latest research on 5v5 games is a big step forward, although he notes that perhaps the most “significant achievement” is OpenAI’s use of visualizations to debug their agents. (These interactive visualizations can be seen here.) “These techniques show how even reinforcement learning and machine learning systems, in general, can be transparent,” Theodorou told The Verge. These add-ons “increase the value of the system,” he says, especially for educational purposes.
The researchers’ use of a separate reward function to encourage the bots to work together was also notable, says Theodorou. This reward function was labeled “team spirit,” and it was increased over the course of each match. The bots start each game pursuing individual goals, like racking up kills, but as time goes on, they focus more on shared objectives.
Brockman says, unlike with human players, that means there’s absolutely “no ego” involved. “The bots are totally willing to sacrifice a lane or abandon a hero for the greater good,” he tells The Verge. “For fun, we had a human drop in to replace one of the bots. We hadn’t trained them to do anything special, but he said he just felt so well-supported. Anything he wanted, the bots got him.”
OpenAI’s team of bots have currently played five multigame matches against amateur and semipro teams, winning four and drawing one. But their greatest challenge will come later this year at The International. Can machines with perfect timing and no ego match the fluid and intuitive play of human professionals? At this point, it’s anyone game.