September 9, 2019 — Artificial intelligence (AI) research took a great leap forward when a Carnegie Mellon University computer program overcame the world’s best professional players in a series of six-player poker games. Experimenting with multi-player, “incomplete information” games offers more useful lessons for real-world problems such as security, business negotiations and cancer therapy than one-on-one, “complete information” games like chess or go. Running on the XSEDE-allocated Bridges system at the Pittsburgh Supercomputing Center, the Pluribus AI was the first to surpass humanity’s best at such a game.
Why It’s Important
It’s obvious, but it bears repeating: Life is not chess. In real-world problems, the pieces are not lined up neatly for all to see. Terrorists have secret plans. Businesspeople have undisclosed deal-breakers and hidden needs that can torpedo negotiations. Cancer cells evade the body and drug treatments by mutating their genes.
Poker may not be a perfect representation of these problems, but it’s a lot closer. Players keep their hands secret, and try to bluff and shift their strategies to keep opponents off-balance. In 2017, Carnegie Mellon’s School of Computer Science grad student Noam Brown and his faculty advisor Tuomas Sandholm broke through the barrier of such imperfect-information games. That’s when their artificial intelligence (AI) program, called Libratus, surpassed four of the world’s best humans in heads-up (two-player), no-limit Texas Hold’em poker, running on the XSEDE-allocated Bridges at the Pittsburgh Supercomputing Center. That victory had been the first in which an AI overcame top players in an incomplete-information game.
“Being unpredictable is a huge part of playing poker … you have to be unpredictable; you have to bluff. If you don’t have a strong hand you have to check; if you do have a strong hand you can’t tip off the other players. Humans are good at that; Pluribus is very good at that.”—Noam Brown, Facebook AI Research and Carnegie Mellon University.
One limitation of the earlier work was that the AI had only faced humans one-on-one. This is a far simpler game than the usual, multi-player poker game, in which a player has to change how to play a given combination of cards from hand to hand to deal with the shifting plays produced by multiple opponents’ strategies. At the time of Libratus’s victory, many experts felt that the multi-player game problem might not be winnable in the foreseeable future. Still, Brown (now at Facebook AI Research) and Sandholm felt it was worth a try. They essentially started over with their new project—but still used the power of Bridges to develop and run the new AI.
How XSEDE Helped
The transition from head-to-head to multi-player poker required a stronger AI approach than the researchers had used with Libratus. Like the earlier AI, Pluribus taught itself to play Texas Hold’em poker before facing the pros. Like Libratus, Pluribus also discovered strategies that humans do not normally employ. But Pluribus played and learned in a fundamentally different way than its predecessor.
Libratus had been designed to think through the entire remaining game when deciding each move. The Carnegie Mellon team realized that such a strategy would never work in multi-player poker because the game size would grow exponentially as the number of players increases. This was one reason why some experts thought the problem might not be solvable.
“You have to understand that opponents can adapt. If you only employ one strategy, you might be exploitable. In rock, paper, scissors, if we assume the other player is responding randomly, if you always throw rock you always break even. But when the other player adapts to always throwing paper, that strategy fails. Understanding that players can switch strategies is a big part of the game.”—Noam Brown, Facebook AI Research and Carnegie Mellon University.
The researchers took the good-enough strategy one step further. Would it be possible to stay ahead of multiple opponents if the AI only planned a few steps ahead, rather than to the end of the game? Such a “limited look-ahead” approach would save computing power to react to and overcome each opponent’s moves.
Pluribus compiled the data and trained itself running on one of Bridges’ large-memory nodes, each of which feature 3 terabytes of RAM—about 100 times that in a high-end laptop, and 20 times what is considered large memory on most supercomputers. Play took place on one of Bridges’ regular-memory nodes. Bridges also helped the Carnegie Mellon team by offering massive data storage.
“Many thought the multi-player game was not possible to win [by an AI]; others thought it would be too computationally expensive. I don’t think anybody thought it would be that cheap.”—Noam Brown, Facebook AI Research and Carnegie Mellon University.
While Pluribus used more power than available on commodity personal computers, its performance represented a huge savings in computing time over Libratus. The earlier AI used around 15 million core hours over two months to develop its strategies and 50 of Bridges’ powerful compute nodes to play. By comparison, Pluribus trained itself in eight days using 12,400 core hours and used just one node during live play. This promises that such AIs may be able to run on commodity computers in the not-too-distant future.
Pluribus used its limited-lookahead strategy in an online tournament from June 1 to 12, 2019, against a total of 13 poker champions, each of whom had won over $1 million in his poker career. The culmination of the Facebook-funded tournament was a series of 10,000 hands against five of the pros at once. Pluribus racked up a literally super-human win rate. The human players reported that the AI’s strategy was impossible to predict and it often made plays that experienced humans never do—probably because doing so successfully is too complicated for the human brain.
Source: Ken Chiacchia, Pittsburgh Supercomputing Center and the Extreme Science and Engineering Discovery Environment (XSEDE)