A group of researchers from Facebook AI Research has now created a more general AI algorithm dubbed ReBel that can play poker better than at least some humans. That’s according to recent reports stemming form a research paper released on the topic. The team is comprised of researchers Noam Brown, Anton Bakhtin, Adam Lerer, and Qucheng Gong.
More specifically, the team claims the new AI can play a game of heads-up no-limit Texas hold’em better than any prior poker-specific AI. That’s a bold claim but, the team says, it’s been backed up by experimentation. The researchers pitted ReBel, which learned poker with less domain knowledge than previous AI, against Dong Kim and three other top human players. For reference, Mr. Kim is considered among the very best players in the world when it comes to head-up poker.
ReBel played faster than two seconds per hand and never needed more than five seconds across 7,500 hands. But the results are more impressive. Scored in thousandths of a big blind, Facebook’s previous Poker AI, Libratus, scored an aggregated score of 147. It beat Mr. Kim by 29, with an average deviation of 78. By comparison, ReBel scored 165 with a standard deviation of 69.
How does Facebook ReBel AI work?
ReBel effectively works by expanding on concepts associated with the “game state,” incorporating common knowledge of the game and policies. More succinctly, it operates by training two AI models, one for value and another for the policy through self-play reinforcement learning. Both models are used during gameplay to generate a public belief state.
That means that it effectively creates probabilities over a defined and limited sequence of possible actions and game states. In poker, the public belief state is comprised of an assortment of decisions a participating player could make. The potential outcomes of a given hand are considered too, as are the overall pot and the chips.
ReBel uses all of that information to create a ‘subgame’ built on the initial PBS. Reinforcement learning is used throughout the play to find new values and add examples to the value AI model. And that repeats until the AI hits a designated threshold for accuracy.
How could this AI be used?
As noted above, compared to other AI that has been built to play games, ReBel doesn’t rely quite so heavily on domain knowledge. That’s to say, it’s more general rather than being taught all of the rules of the game. And that comes back to, as mentioned earlier, uncertainties and unknown information present in a game of poker.
Summarily, this AI is something very different from the more specialized AI created by Google back in 2017.
Instead, the researchers indicate that ReBel pushes AI algorithms forward toward more universal use. Namely, toward use cases involving environments with less pre-determined factors. Specifically, the researchers point to ‘imperfect-information multi-agent interactions’. And they list out use cases such as auctions, negotiations, cybersecurity, and autonomous vehicles.
None of that is to say that this AI will be present and accounted for in the real-world anytime soon. And Facebook is certainly not going to release the ReBel codebase for poker — the researchers indicate. That would only pave the way for users wanting to cheat the system when it comes to real, high-stakes games. However, this algorithm does stand, the researchers assert, as a suitable domain for further research in pursuit of technologies such as those listed above.