Computers and the poker face
(appeared in Mar 2017)

(link to main website)

In ‘The Honeymoon Machine’, a 1961 Hollywood farce, the computer aboard a US naval ship off the coast of Venice is used to predict outcomes at roulette. The National Bank was heading for bankruptcy when the role of the US ship was detected and the US Government stepped in to avert a diplomatic crisis., says S.Ananthanarayanan.

From an artificial intelligence (AI) viewpoint, predicting results at roulette is still outside the computers’ capacity. With the progress in the mathematics of how strategies are planned and methods of data analyses, however, computers are now doing better in other areas which were thought to be outside computers’ reach. Matej Moravcík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson and Michael Bowling, from the University of Alberta at Edmonton and the Charles University and the Czech Technical University at Prague report in the journal, Science, that yet another conceptual milestone has been passed on the route of computers doing the things we thought only people can.

One area where computers have proved themselves is in playing games of skill against human adversaries. A typical example is in chess, where each player has a vast choice of permissible moves, but only a few can lead to victory or avoid defeat. A method that many beginning human players try to use is to imagine the possible moves and counter-moves and then the answers, and the next lot of counter-moves, and so on. This, however, does not lead the player very far and, with practice, she switches to generic strategies. The computer, in contrast, has the capacity to analyse a huge number of possible exchanges. This ability alone, however, could not help computers, which still lost out to the experience of a good human player.

Computers then used a different tack, more like what a human player does – to look for patterns in board positions and select strategies from experience, making use of libraries and learning and getting better as the game progressed. Given the parameters of a situation, there is mathematical theory that can work out the strategy that is most likely to succeed, whatever the moves of the opponent, especially when the opponent uses her best possible strategy. Given the ability to analyse trends of data and to evaluate strategies, computers have got very good at chess, and a fine example is the hard time that IBM’s Deep Blue computer gave grand master Gary Kasporov. Computers have similarly done very well in different kinds of board games, like Checkers, Backgammon and recently, the ancient Chinese game of GO.

Perfect information

What these games have in common is that, with different degrees of complexity, all the data for generating moves that strengthen players’ positions is there for both or all players to see. It is in these circumstances, of ‘perfect information’, that the algorithms that help computers play these games have worked. Another kind of game, which may be more akin to real-life, is where all the data is not available to all players – a typical case is card games, where some of the cards known to one player are not known to others. How to programme computers to play this category of ‘imperfect information’ games been a challenge for over sixty years, say the authors of the paper in the journal. The authors then describe ‘DeepStack’ a computer procedure that can deal with this category games, specifically a variation of Poker, called “Heads up, no limit Texas hold’em

Machine learning

One method that computers use is to classify trends in data, which may represent images, for recognising patterns, or abstractions like the value of chessboard positions. This method seeks to fit the data into a formula, called a ‘hypothesis’ and then to test the formula with untried situations. For example, a value, like the price of a house, may depend on a combination of factors like the area in square feet, the number of bathrooms, the number of floors, the area of the garden, the distance to the supermarket, the Metro station and so on. The computer tries out different weightages, or multiplying factors that attribute importance to the factors, to create an algebraic expression, based on the value of the factors, that works out the value of the house.

The method that is used to go from one set of multiplying factors to a better set is to change the factors in such a way that it minimises the sum of the differences between the values output by the hypothesis and the actual values. This difference is known as the cost function, and the values used are called the ‘training values’. Using these methods, which have grown in complexity and sophistication, a powerful computer can take in the data of the pixels of the screen of a digital camera and make out if the image is of house, a car, a dog or man or a woman. While it is routine for these programmes to compare fingerprints, they can now even recognise faces.

These programmes are now run the way the human brain works, which is, not by maintaining an exhaustive library with which to compare new data, but by strengthening the paths that lead to specific responses. In this way, networks of nerve cells in the brain go through a process of learning every time differently strengthened, random responses lead to correct or incorrect results. This is the way a human baby, for instance, learns the use of language long before she hears about the rules of grammar. Computers now mimic this process by creating layers of software ‘nodes’ that receive inputs from the nodes in an earlier layer and then pass signals to the nodes in a subsequent layer. Whether the answer that the final layer outputs is correct or not then controls how multipliers along the path are increased or decreased, till paths that lead to correct answers become more and more likely.

As we just said, these methods have been successful in automating complex, adversarial problems, exemplified by games of strategy, but of the kind where the information available is there for all players to use. The other kind, ‘imperfect information’ games, has additional features of each player not only using the extra information she has but also of trying to mislead other players into forming incorrect impressions of the assets the first player holds. And the game of Poker, where players make bets on cards that only they know, against cards that only the other players know, consists of “bluffing, of little tactics of deception, of asking your-self what is the other man going to think I mean to do...” in the words of Von Neuman, pioneer of strategy theory and computing, that the journal paper quotes.

In games like Poker, the deception is practiced by both sides and “how our opponents’ actions reveal their (private) information depends upon their knowledge of our private information and how our actions reveal it,” the authors say in the paper. Solving imperfect information games then involves this kind of ‘recursive’ reasoning, where one thing depends on another which itself depends on the first thing, and so on. The probability of a line of play being chosen by a player depends on the history of the game, the private information of the player, which could be revealed by the game history, and again the player’s suspicions about others’ private information revealed by their own past play history. The factor to be minimised, for improving strategy, then moves from the ‘cost function’, to a value called the ‘counterfactual regret,’ which measures the difference between the utility of a method of play chosen and the value that was possible.

The DeepStack algorithm was then tried out in actual practice, playing 3,000 games each against eleven professional Poker players. The result was that DeepStack could beat ten of these eleven pros by what the paper describes as a 'a statistically significant margin’. While this is an achievement, the authors emphasize that the DeepStack algorithm represents a paradigm shift in the approach to large, sequential, imperfect information games. “With many real-world problems involving information asymmetry, DeepStack also has implications for seeing powerful Artificial Intelligence applied more in settings that do not fit the perfect information assumption. The abstraction paradigm for handling imperfect information has shown promise in applications like defending strategic resources and robust decision making as needed for medical treatment recommendations. DeepStack’s continual resolving paradigm will hopefully open up many more possibilities,” the authors say.

------------------------------------------------------------------------------------------

Do respond to : [email protected]