DeepMind’s AI agents conquer human pros at StarCraft II

The games were streamed in DeepMind’s London headquarters (pictured).
Image: DeepMind

AI agents developed by Google’s DeepMind subsidiary have beaten human pros at StarCraft II — a first in the world of artificial intelligence. In a series of matches streamed on YouTube and Twitch, AI players beat the humans 10 games in a row. In the final match, pro player Grzegorz “MaNa” Komincz was able to snatch a single victory for humanity.

“The history of AI has been marked by a number of significant benchmark victories in different games,” David Silver, DeepMind’s research co-lead, said after the matches. “And I hope — though there’s clearly work to do — that people in the future may look back at [today] and perhaps consider this as another step forward for what AI systems can do.”

Beating humans at video games might seem like a sideshow in AI development, but it’s a significant research challenge. Games like StarCraft II are harder for computers to play than board games like chess or Go. In video games, AI agents can’t watch the movement of every piece to calculate their next move, and they have to react in real time.

These factors didn’t seem like much of an impediment to DeepMind’s AI system, dubbed AlphaStar. First, it beat pro player Dario “TLO” Wünsch, before moving to take on MaNa. The games were originally played in December last year at DeepMind’s London HQ, but a final match against MaNa was streamed live today, providing humans with their single victory.

Professional StarCraft commentators described AlphaStar’s play as “phenomenal” and “superhuman.” In StarCraft II, players start on different sides of the same map before building up a base, training an army, and invading the enemy’s territory. AlphaStar was particularly good at what’s called “micro,” short for micromanagement, referring to the ability to control troops quickly and decisively on the battlefield.

Even though the human players sometimes managed to train more powerful units, AlphaZero was able to outmaneuver them in close quarters. In one game, AlphaStar swarmed MaNa with a fast-moving unit called the Stalker. Commentator Kevin “RotterdaM” van der Kooi described it as “phenomenal unit control, just not something we see very often.” MaNa noted after the match: “If I play any human player they’re not going to be microing their Stalkers this nicely.”

This echoes behavior we’ve seen from other high-level game-playing AI. When OpenAI’s agents played human pros at Dota 2 last year, they were ultimately defeated. But experts noted that the agents again played with a “clarity and precision” that was “hypnotic.” Making quick decisions without any errors is, unsurprisingly, a machine’s home turf.

Experts have already begun to dissect the games and argue over whether AlphaStar had any unfair advantages. The AI agent was hobbled in some ways. For example, it was restricted from performing more clicks per minute than a human. But unlike human players, it was able to view the whole map at once, rather than navigating it manually.

DeepMind’s researchers said this provided no real advantage as the agent only focuses on a single part of the map at any one time. But, as the games showed, this didn’t stop AlphaStar from expertly controlling units in three different parts areas simultaneously — something that the commentators said would be impossible for humans. Notably, when MaNa beat AlphaStar in the live match, the AI was playing with a restricted camera view.

Another potential sore point included the fact that the human players, while professionals, were not world-champion standard. TLO in particular also had to play with one of StarCraft II’s three races that he was not familiar with.

A graphical representation of AlphaStar’s processing. The system sees whole map from the top down and predicts what behavior will lead to victory.
Image: DeepMind

This discussion aside, experts say the matches were a significant step forward. Dave Churchill, an AI researcher who’s long been involved in the StarCraft AI scene, told The Verge: “I think that the strength of the agent is a significant accomplishment, and came at least a year ahead of the most optimistic guesses that I’ve heard among AI researchers.”

However, Churchill added that as DeepMind had yet to release any research papers about the work, it was difficult to say whether or not it showed any technological leap forward. “I have not read the blog article yet or had access to any papers or technical details to make that call,” said Churchill.

Mark Riedl, an associate AI professor at Georgia Tech, said he was less surprised by the results, and that this victory had only been “a matter of time.” Riedl added that he didn’t think the games showed that StarCraft II had been definitively beaten. “In the last, live game, restricting AlphaStar to the window did remove some of its artificial advantage,” said Riedl. “But the bigger issue that we have seen... is that the policy learned [by the AI] is brittle, and when a human can push the AI out of its comfort zone, the AI falls apart.”

A screenshot from the games in December, showing AlphaStar facing off against TLO.
Image: DeepMind

Ultimately, the end goal of work like this is not to beat humans at video games but to sharpen AI training methods, particularly in order to create systems that can operate in complex virtual environments like StarCraft.

In order to train AlphaStar, DeepMind’s researchers used a method known as reinforcement learning. Agents play the game essentially by trial and error while trying to reach certain goals like winning or simply staying alive. They learn first by copying human players and then play one another in a coliseum-like competition. The strongest agents survive, and the weakest are discarded. DeepMind estimated that its AlphaStar agents each racked up about 200 years of game time in this way, played at an accelerated rate.

DeepMind was clear about its goal in conducting this work. “First and foremost the mission at DeepMind is to build an artificial general intelligence,” said Oriol Vinyals, co-lead of the AlphaStar project, referring to the quest to build an AI agent that can perform any mental task a human being can. “To do so, it’s important to benchmark how our agents perform on a wide variety of tasks.”

Comments

You know DeepMind just Zerg Rushed. Bot!!

A bit surprised that no links exist to this.

Here’s a link to the almost 3 hour stream.
https://youtu.be/cUTMhmVh1qs

Thanks, there are links and an embedded video in the article now. It’s possible (though doubtful) they were there before and my mobile browser didn’t load them.

"The strongest agents survive, and the weakest are discarded"
How can one AI Agent be better/smarter/faster than another. Sounds to me like this would be what people would be worried about.

The decisions they learned to make based on the information they have gathered is weaker than the ones that beat them.

I’m not sure how Google is doing it here, but often there is either some sort of randomization of neural node weighting when you’re discussing AI agents competing against each other (this is closer to what a genetic algorithm is where you have some base set that you produce a variety of random influenced offspring sets and then see which performs better), or what happens is that since the node weight represents a probability, different executions will provide different results based off of that probability, and whatever feedback mechanism they use to update the node weights gets driven by whatever the probabilistic decisions are.

Google’s probably using something more akin to the latter description. Think about a complex decision tree where the decisions that execute are based on dice rolls and the direction you traverse the tree from any given node is based off of a value in that node and whether the dice roll falls into it or out of it. With something as complex as a neural network, there are many, many "dice rolls" to consider, so when an AI plays against itself, each side behaves different, and what happens is the side that wins considers the collection of all those dice rolls and outcomes and updates the neural network’s weighting/probabilities such that it’s more likely to make those moves in the future.

Well one thing humans have over ai: we don’t take 200 years to learn how to play a game well. (Even if they are able to compress that down.)

That’s because humans suck at programming AI learning methods. Maybe we should make an AI to figure out how to do that better.

I don’t know where to begin… let’s get an ai that tells us that!

What do you mean though? They can play millions of rounds in a day to improve themselves, and they certainly don’t take 200 years or even decades to perfect it. Also once they do, that’s it, we are never beating them, even if we train for 200 years ourselves.

Don’t really see the point here.

DeepMind estimated that its AlphaStar agents each racked up about 200 years of game time in this way, played at an accelerated rate.

This is what they are referring to. It’d take a human significantly less time to do so, although we can’t play at an accelerated rate. So either way it still doesn’t take them 200 years in real time, but imagine if it only took them as long as a human to learn something like that (maybe 20 – 60 hours depending on game difficulty) and they did this at an accelerated rate. They could play millions or billions more in simulation.

That’s pretty much my point – the amount of games it takes them to get it "right" is irrelevant since they only have to do that once, and will never be beaten after.

An average human won’t be nowhere near E-sports level even if he plays one game for a year.

That’s a good point, looking forward to the progress in AI within the next five years. It won’t be slowly introduced into the workforce, it’ll be a tidal wave of automation and disruption. Judging by how quickly it’s picking up video games, there are a lot of jobs that could be automated using AI, Google duplex, and robots.

That’s a pretty flawed way to reason it though. Humans aren’t learning a game in a vacuum like these AIs are. Humans leverage their real world training and the knowledge we learn from education in order to learn new games. We also have the advantage of neural superiority with respect to raw numbers. He have far more neurons working in parallel in ways that AI doesn’t, and we have regioned neural networks designed for navigating our bodies in 3D space that we can also use to make projections on expected behavior for other things. On top of that, when you’re playing another person, you’re taking information from another individual with these same insanely complex neural system that has its own way to do it. Meta strategies and getting good at a game like SC involves experience with those other people and reacting to and understanding what they’re doing, which is leveraging all the experience they’re putting into the game as well. This is why first strategies for teaching AIs is to use human games.

Take a newborn and sit them in front of SC playing bots and see how long it takes them to learn the game. And then do you think after all the years it takes them to learn it (since they’ll be wholly useless at first because babies are useless), that just playing bots will let them compete with the big boys? Not at all.

Arguably, we sort of do. Older games like Chess have gone through strategy developments over centuries, and we see how AlphaZero developed some of those same strategies and discarded them in a much, much shorter time frame.

With new games like StarCraft, you can’t consider how we play that game in a void of human knowledge, since we don’t start from scratch when learning a new game. We use relations from other things we’ve learned, so you’re in essence relying on other knowledge, much of which comes from centuries of human development. Time travel and take some poor pleb from 500BC and sit him down in front of a computer and ask them to learn SC. See how long it takes them, if they don’t freak out and lose their mind over what they’re seeing. You take for granted things that have been passed down through humanity that affect how you understand the world around you.

People always ask "why do we need to learn math"? Because that’s thousands of years worth of passed down knowledge that we can use in a relational manner to help us better understand how to tackle new information and solve new problems.

It’s also one of the reasons that it’s annoying when people don’t realize why AI needs so much training. Go observe how useless babies are and how many years it takes humans to become functional. That’s training, but involves a combinatoric count of neurons that dwarfs AI so much it’s astounding. People don’t seem to realize that all the neural connections we use for everyday things are constantly used to assess new information we receive. A person can easily estimate what will happen when trying to jump in a video game because we’ve had years and years of experience with what jumping is. We leverage all the knowledge we don’t even realize we’re acquiring when confronting new things, and to judge AIs at this stage based off of humans learning something is really flawed.

TLO …TheLittleOne, used to play Supreme Commander with him back in the day, top guy.

I’ve often wondered with AI playing video games how do they interface the AI? Is the AI given a keyboard and some kind of robotic hand to control the game or is there a direct line or something? Perhaps the AI machine just has a USB wire that connects like a wired USB keyboard? Or is it Bluetooth?

I’ve been toying with the idea of developing AI to play PS4 games and I’ve been stuck on how the controls would work. For this AI with Starcraft, does the AI see the screen through a camera?

no, it has API access to the game.

But in this case they follow the in-game rules, right? Like not seeing through fog of war?

Otherwise it would be pretty easy to beat any human I’d assume, even ancient AI could do that when they didn’t abide by any human rules and could instantly react to any of your command even if they didn’t see you do it.

Yes it had fog of war, but the ai could see the whole (visible) map during its 10 wins… the one loss they adjusted so that it was limited where it could look.

The way you described it, would be more difficult. If they could pull that off, a lot of us would be out of a job lol. It’ll happen eventually though.

The thing with RTS games, especially Starcraft 2 which is pretty fast-paced, is that a player that plays fast is advantaged. A pro player will have a much higher APM (actions per minute) than the average player, and that’ll help him micro regardless of his strategic skills.

So couldn’t an AI like this have basically infinite APM? It’s interacting with an API rather than a keyboard and mouse, so the APM could get crazy high. Are they setting a limit to keep it fair or something?

The AI agent was hobbled in some ways. For example, it was restricted from performing more clicks per minute than a human. But unlike human players, it was able to view the whole map at once, rather than navigating it manually.

I think limiting the clicks is how they limit the actions per minute. Otherwise, without restrictions, we’d be obliterated before we knew what happened.

Still, I think there is a latency involved in human actions that the AI won’t be troubled by since it interfaces directly via API.

I’ve played a lot of RTS games, including StarCraft II and while I’m not even close to playing in a professional comp, the problem I always have is speed of scrolling, selecting units and actioning, especially in large multi-unit engagements.

Even though Pro players would be much better than me, as humans, they’ll always have some latency. This gives the AI’s direct API access the advantage where a split second can make all the difference.

To be fair, they have to factor in not just actions (clicks) per minute, but the latency involved as well. Not to mention reducing the AI’s view scope to the same map window of the human players.

View All Comments
Back to top ↑