Two-player limit Texas hold’em poker has finally been solved, according to a study published in Science today. Scientists have designed a computer program, named Cepheus, with a strategy for the game that is so close to perfect that statistical analysis shows it can’t be defeated by a human poker player, even if that player competed against the computer for an entire lifetime. This means that no matter how the game starts out, the computer will win or break even in the long run — making it essentially unbeatable.
It's essentially unbeatable, in the long run
"We’re not saying that it’s guaranteed to win money on every single hand," says Michael Bowling, a computer scientist at The University of Alberta and a co-author of the study. "What we’re saying is that, in the long run, if you looked at all the hands that could happen and you averaged all of those, then the computer can’t be losing, at a losing rate — it has to be either breaking even or winning."
Solving a game like poker is a huge computational achievement. (Solving a game essentially means designing a program that can't lose over long periods of time.) Simpler games, like tic-tac-toe, are easy to solve. Most humans eventually grasp how to win or draw a game of tic-tac-toe simply by playing a couple of times. But other games are far harder. Chess and checkers are complex and offer thousands of possible scenarios to which computers have to react. Yet, even those games aren’t as hard to solve poker, because in poker, you don’t have all the information — you don’t know what cards your opponent is holding. "Perfect information games like chess or checkers, are games where all of the information that you need to make your decision is stored on the board," Bowling says. But poker isn’t like that; it’s an "imperfect information game," and that makes developing a strategy far more difficult.
The training phase involved 200 computers
The way the program works is actually pretty simple: all it has to do during a game is search its database of pre-computed game situations to find the most optimal move at any given moment. Building that database, however, was far from easy. "We had this training phase where the program started off playing uniform random against itself," meaning that "it had no idea what it was doing other than following the rules of the game," explains Michael Johanson, a computer scientist also at the University of Alberta, and a co-author of the study. But as the computer played itself, it got better and updated its strategy.
"It does that by thinking of all possible decision points, and every possible action [that could take place from those points]," explains Bowling. For example, the program might think: "what if I raise here, instead of playing randomly, how much more money or less money would I win?" If it decides to play randomly, and loses money, then it goes back and computes how much money it would have won if it had raised instead. That amount then gets stored as a regret value, he says. "And so it computes that regret number for every single action, of every single place that it gets to make that decision." So, every time it plays a hand, the program shifts its strategy so that it starts to do what it regretted not doing in past games more often. And, as it updated itself, Cepheus eventually approached what Bowling calls "perfect play."
"we stopped at this point because we can’t tell it apart from being perfect."
The training phase took 70 days, and a cluster of 200 computers, each armed with 32 GB of ram and 24 central processing units, Johanson says. At the end of those 70 days, Cepheus’ game was nearly perfect. "We could continue to train it, and it would continue to get better," Bowling says. "But we stopped at this point because we can’t tell it apart from being perfect." Even if the program spent a lifetime in training, he says, getting it any closer to perfect really wouldn’t have much value — "other than the academic novelty." In short, training Cepheus further wouldn’t change how successful the program is in any noticeable way.
Cepheus was also able to demonstrate that the player who deals the cards, and therefore goes second has a tiny advantage over the other player, before any of the cards are dealt. "We actually can now prove that the dealer has an advantage of what we call '88 millablinds' per game," Johanson says. "That's .088 of a big blind per game."
"This is, to my knowledge, the largest imperfect-information game essentially solved to date," says Tuomas Sandholm, a computer scientist at Carnegie Mellon University who didn’t participate in the study, in a news story about the study also in Science today. It’s also "the first one competitively played by humans that has now been essentially solved."
11 years of poker
Bowling is part of a team of researchers that first started tackling the idea of a poker program in 2003. Back then, he says, creating a program that could solve a game of two-player limit Texas hold’em was the furthest thing from their minds. "I don’t think anyone was dreaming that we were going to solve this game." Instead, they worked on developing a program that could defeat top players at heads-up limit poker, "the simplest game of poker humans play," Bowling says. In 2008, they were successful. That program, called Polaris, lost to professional players Phil "The Unabomber" Laak and Ali Eslami a year prior, but when the researchers improved it, it won three out of six games and tied another.
"We actually built in some adaptation qualities into it so it could try to take advantages of human weaknesses," Bowling says — something that Cepheus doesn’t do because it tries to play perfectly and therefore avoids having to adapt to its opponents.
"the largest imperfect-information game essentially solved to date."
After Polaris beat the top players, Bowling and his team had to make a decision about what to do next. Were they going to try to solve two-player limit Texas hold’em, a more complicated game? "Someone did a back of the envelope calculation and it came out that we would need four petabytes of disk space [1,000,000 gigabytes], just to write down the solution after we solved the game," Bowling says. "At the time, I said, 'well I guess we can’t solve it then, so let’s move on.'" But other researchers on his team insisted; buying petabyte disks was possible, after all.
In the end, solving the game didn’t take four petabytes of disk space. "We learned some things along the way," Bowling says, "like you can take all the hearts and spades and swap as suits," he says, which dropped them down to 520 terabytes. They also figured out how to compress the data, so that the program could access the strategy quickly. "There’s a lot of these technical balancing acts," Bowling says. "If the game had been a little bit bigger and things had been just a little bit slower, we might not have able to pull this off."
Now that the researchers have solved two-player limit Texas hold’em, they want to work on other forms of poker, like heads-up no limit poker. The challenges of the game means that they probably won’t be able to solve it, but they might be able to produce a program that can beat the world’s best human players. The same goes for a three-player game of limit Texas hold’em. "There’s no strategy in a three player game that can guarantee that it doesn’t lose because it’s actually possible that the other two players in the game might gang up on it." Collusion is illegal in a competitive game, but it’s hard to quantify what that actually means, Bowling says. Some people might do it without even realizing it. Still, he says, when they have tested Cepheus against two other computers, and it appears to produce good strategies. "We just can’t say as much about whether it produces optimal strategies," Bowling says.
It could help governments or companies optimize their security strategies
Cepheus could end up doing a lot more than playing poker. The researchers are already thinking of ways it could help governments or companies optimize their security strategies and make then "unexploitable," Bowling says. Cepheus could, for instance, schedule patrols or checkpoints in such a ways as to foil an adversary that might try to exploit a defense strategy. The program could also be used to help doctors tweak treatments for diabetic patients. If their diet or their activity level changes, the program could compute the optimal response all while taking into account any number of uncertainties.
"I’m intrigued by all this," Bowling says. "Maybe it’s because we’ve hit this milestone that I really want to see them have applications outside of the poker space." Of course, Bowling's desire to move away from the game might have something to do with his opinion of poker. "I don’t have the patience to play poker," he says. "I actually kind of find it boring." The computer scientist has only played poker once over the last year. "The only time I’ve played poker in the last twelve months was when I was testing our current program’s interface," Bowling says. "I played about a hundred hands."
To check out Cepheus’ strategy or to play against it, click here.
- Source: Science