Earlier this year, in a 20-day competition involving 120,000 hands at Rivers Casino in Pittsburgh, the Libratus AI was able to defeat four of the best professional poker players. Libratus beat players individually and was able to amass over $1.8 million in chips.
According to Tuomas Sandholm, professor of computer science, and Noam Brown, a PhD student in the Computer Science Department at Carnegie Mellon, the AI "used a three-pronged approach" to master the game with "more decision points than atoms in the universe."
The research was published in the journal Science.
The problem with poker, in comparison to Chess or Checkers, is that bluffing is involved. Rather than decision points being made purely based on predictions of future moves and black-and-white steps to take, becoming a master poker player also involves recognizing and understanding tactics such as bluffing.
According to the team, it was possible for Libratus to go beyond other games by breaking poker into "computationally manageable parts and, based on its opponents' gameplay, fix potential weaknesses in its strategy during the competition."
The AI includes three mobiles, the first of which creates an abstract version of the game which is smaller and "easier to solve" than standard, human-central games. There are 10161 -- the number one followed by 161 zeros -- decision points in the game, and so based on this easier version, Libratus can create a strategy for the early rounds.
This "blueprint strategy" then serves as a platform for later stages of the game, one example being grouping similar hands together and treating them in the same way.
"There is little difference between a King-high flush and a Queen-high flush," Brown said. "Treating those hands as identical reduces the complexity of the game and thus makes it computationally easier [...] similar bet sizes also can be grouped together."
When the poker game proceeds to the final rounds, a second module comes into play, creating a more detailed plan-of-action based on the game. A strategy is also developed in real-time which, while using the blueprint for guidance, is able to switch the AI's tactics depending on hands and bluffs.
If the opponent makes a move which has not been considered in the abstraction, the mobile computes a solution in the subgame which adds this move to the mix.
The third module focuses on improving the blueprint strategy as the game proceeds. According to Sandholm, this section of the AI detects mistakes in the opponent's strategy in order to exploit them -- but this could also "open the AI to exploitation if the opponent shifts strategy."
"Instead, Libratus' self-improver module analyzes opponents' bet sizes to detect potential holes in Libratus' blueprint strategy," the team says. "Libratus then adds these missing decision branches, computes strategies for them, and adds them to the blueprint."
The technology has been licensed to Sandholm's company Strategic Machine, which "applies strategic reasoning technologies to many different applications."
"The techniques that we developed are largely domain independent and can thus be applied to other strategic imperfect-information interactions, including non-recreational applications," Sandholm and Brown said. "Due to the ubiquity of hidden information in real-world strategic interactions, we believe the paradigm introduced in Libratus will be critical to the future growth and widespread application of AI."