Bandit-based algorithm to play Go

You know that computers can beat humans at lots of games. But so far, humans are still better than the most powerful systems when playing at Chinese strategy game Go. The reason is simple: computer 'brute' force cannot be used because sequential approaches simply do not work for many reasons. Now, in two separate efforts, European researchers are refining new algorithms to help computers beat humans at the Go game. In Hungary, computer scientists are working on algorithms based on how one-armed bandits in casinos are paying players. Some are giving more money than others, and they've based their work on this observation. And in France, while recognizing that the Go game is a challenge for artificial intelligence, other researchers are also developing winning techniques based on Monte-Carlo algorithms.

You know that computers can beat humans at lots of games. But so far, humans are still better than the most powerful systems when playing at Chinese strategy game Go. The reason is simple: computer 'brute' force cannot be used because sequential approaches simply do not work for many reasons. Now, in two separate efforts, European researchers are refining new algorithms to help computers beat humans at the Go game. In Hungary, computer scientists are working on algorithms based on how one-armed bandits in casinos are paying players. Some are giving more money than others, and they've based their work on this observation. And in France, while recognizing that the Go game is a challenge for artificial intelligence, other researchers are also developing winning techniques based on Monte-Carlo algorithms.

In case you're not familiar with the Go game, let's refresh our memories with Wikipedia. Below is a picture of a a traditional Go board, which is wooden, with black painted lines. The stones are flat and fit closely together when placed on adjacent intersections" (Credit: Wikipedia). Here is a link to a larger version.

The Go board

And here are the rules (Credit: Wikipedia).

Go is played by two players alternately placing black and white stones on the vacant intersections of a 19×19 rectilinear grid. A stone or a group of stones is captured and removed if it is tightly surrounded by stones of the opposing color. The objective is to control a larger territory than the opponent by placing one's stones so they cannot be captured. The game ends and the score is counted when both players consecutively pass on a turn, indicating that neither side can increase its territory or reduce its opponent's; the game can also end by resignation.

Now, what have found the Hungarian researchers? Here is Reuters answer.

In Go, all marbles have an identical value, and scenarios are more complex, so the computer has to think about all potential moves through the end of the game and emulate the outcome of each alternative move. Even the most powerful computers have failed at that task, but Levente Kocsis and colleague Csaba Szepesvari have found a way to help computers focus on the most promising moves, using an analogy with slot machines in a casino.
Punters will find that certain one-armed bandits in a casino appear to pay more on average than others, but an intelligent player should also try machines that have so far paid less, in case they are hiding a jackpot, Kocsis said. The key is to find the balance between the two sorts of machine.

Their research work has been published by Springer Berlin as a chapter of a book about computer science in 2006 under the name "Bandit based Monte-Carlo Planning." Here are two links to the abstract and to the full paper (PDF format, 12 pages, 262 KB).

This morning, when I decided to write about this research about the Go game, I had the surprise to find in my -- physical -- mailbox the last issue of a monthly letter from INRIA (Institut National de Recherche en Informatique et en Automatique). In the March 2007 of INédit, there was an article about the same subject. Here are some excerpts from the online version, "Go game, a challenge for artificial intelligence."

[The researchers] worked on Monte-Carlo randomness methods. Monte-Carlo algorithms are commonly used in optimisation techniques when chance is the dominant factor and the problem is large-scale: how to best assess each system's state -- be it a position in the game, data from a sensor or the control of a robot's movement -- and to successfully bring a rule into widespread use? The Monte-Carlo method made it possible to establish the best possible position of the Go game pieces by sampling a limited number of solutions.

One of the researchers, Rémi Coulom, wrote a program named Crazy Stone "based on the Monte-Carlo evaluation technique, combined with tree search." This program won the gold medal at the 11th Computer Olympiad in June 2006. Coulom described how the program works in "Efficient Selectivity and Back-up Operators in Monte-Carlo Tree Search." Since then, another program written at INRIA, Mogo, outclassed Crazy Stone (page in French).

Sources: Reuters, via CNET News.com, February 21, 2007; INédit number 58, INRIA, March 2007; and various other websites

You'll find related stories by following the links below.

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All