Remember Libratus, the Carnegie Mellon University developed AI poker bot that’s been humbling poker professionals at Texas hold’em for a couple of years. Well, say hello to Pluribus, an upgraded bot, which has now beaten top Texas hold’em professionals, but this time instead of in one-on-one matches, Pluribus has been winning games with six players.
“While going from two to six players might seem incremental, it’s actually a big deal,” said Julian Togelius at New York University, who studies games and AI and was quoted in a recent Nature report (No limit: AI poker bot is first to beat professionals at multiplayer game). “The multiplayer aspect is something that is not present at all in other games that are currently studied.” A paper (Superhuman AI for multiplayer poker) by the software’s primary developers was also published in Science.
According to the Nature report written by Douglas Heaven, in a 12-day session with more than 10,000 hands, Pluribus beat 15 top human players. “A lot of AI researchers didn’t think it was possible to do this using [our] techniques,” said Noam Brown of CMU and Facebook AI Research, who developed Pluribus with his Carnegie colleague Tuomas Sandholm.
Brown and Sandholm radically overhauled Libratus’s search algorithm, according to the Nature article. “Most game-playing AIs search forwards through decision trees for the best move to make in a given situation. Libratus searched to the end of a game before choosing an action.”
Here’s a brief excerpt:
“The key breakthrough was developing a method that allowed Pluribus to make good choices after looking ahead only a few moves rather than to the end of the game.
“Pluribus teaches itself from scratch using a form of reinforcement learning similar to that used by DeepMind’s Go AI, AlphaZero. It starts off playing poker randomly and improves as it works out which actions win more money. After each hand, it looks back at how it played and checks whether it would have made more money with different actions, such as raising rather than sticking to a bet. If the alternatives lead to better outcomes, it will be more likely to choose theme in future.
“By playing trillions of hands of poker against itself, Pluribus created a basic strategy that it draws on in matches. At each decision point, it compares the state of the game with its blueprint and searches a few moves ahead to see how the action played out. It then decides whether it can improve on it. And because it taught itself to play without human input, the AI settled on a few strategies that human players tend not to use.”
The revised strategy used by Pluribus significantly reduces the computational complexity and workload. Here’s a comparison of systems used by various well-known game machines, taken from the Science paper:
“When playing, Pluribus runs on two Intel Haswell E5-2695 v3 CPUs and uses less than 128 GB of memory. For comparison, AlphaGo used 1,920 CPUs and 280 GPUs for real-time search in its 2016 matches against top Go professional Lee Sedol, Deep Blue used 480 custom-designed chips in its 1997 matches against top chess professional Garry Kasparov, and Libratus used 100 CPUs in its 2017 matches against top professionals in two-player poker. The amount of time Pluribus takes to conduct search on a single subgame varies between 1s and 33s depending on the particular situation. On average, Pluribus plays at a rate of 20s per hand when playing against copies of itself in six-player poker. This is roughly twice as fast as professional humans tend to play.”
Link to Nature report: https://www.nature.com/articles/d41586-019-02156-9
Link to Science paper: https://science.sciencemag.org/content/early/2019/07/10/science.aay2400
Link to Carnegie Mellon article: https://www.cmu.edu/news/stories/archives/2019/july/cmu-facebook-ai-beats-poker-pros.html