Global Evaluation Strategies in Go

(this is a very rough, preliminary draft)

In Go programs, since the branching factor is so high, there is a lot of emphas on the problem of trimming the game tree down to a small set of possibilities as rapidly as possible.

Also, it's pretty common to conceive of a global search strategy, based on selecting the most likely n moves from the whole board, and doing an n-ply tree search based on these starting points.

Finally, it's common to conceive of using multiple simple evaluators, based on easily measurable features of the game, as the basis for evaluation.

This paper presents some preliminary, and not encouraging, results of building evaluators along these lines.

Evaluating the evaluators

The first problem is how to evaluate the evaluators. In this paper, the evaluation is based on the master game concept. We select a professional game, which is assumed to be perfect; the moves actually played in the game are the "gold standard" we wish the program to achieve. For each position in the game, we score each legal move, sort, and note the position of the actual professional move in the list. The score of the evaluator is the distance from the top of the list.

We define the predictive value of a scoring function as the average position of the professional move in the ranking of all possible moves. By this measure, a a perfect evaluator would always rank the actual professional move highest, and would have a predictive value of 1.0. A completely random function would achieve a predictive value of 0.5.

As applied to the "global search" strategy, the idea is that if the evaluator can place the "correct" move in the top n possibilities, keeping it in consideration, then the search can move it the rest of the way to the top.

Learning from Professional games.

As a correllary to the master game model for evaluating score functions, it should be possible to extract information from professional games based on reasonable concepts of strategic and tactical considerations that should be a factor in the professional's evaluation.

Suppose we define a function influnece which measures the preponderance of influence of white or black over a particular square. We suppose that correct play tends to move to intersections which are neither too influenced by "friendly" or "unfriendly" pieces.

Play through a large number of professional games, noting the influence characteristics of the moves actually played.

Use the result as a predictor for where future moves would be played.

This basic method is extremely flexible; anything you can measure can be distilled into recommendations for play, without your manually determining exactly what values for the measurement are good or bad.

Some simple evaluators

These are some of the evaluators, which are used in linear combinations, to scores in later examples. The exact details and definitions are unimportant. Except for "proxemity", the actual weights for individual positions are derived from automatic evaluation of professional games. Depending on the exact technique, the amount of data for each measurement varies from very modest to very large.

Proxemity
A simple calculations based on the assertion that a good move will be close to a previous move. It's interesting that this evaluator alone can achieve a predictive value of 0.8.
Equal Influence
based on a function influence which measures the preponderance of influence of white or black over a particular square.
2D thickness
Based on a measurement of "thickness" and on the number of stones in the immediate neighborhood.
Neighborhood
Characterize the 3x3 neighborhood surrounding the played position, note how often this neighborhood occurred in professional play. This measurement, used alone, can achieve a predictive value of .63.
Distant Neighborhood
Characterize the overall position, including the number of stones adjacent, the distances and directions to other nearby stones and edges. This measurement, used alone, can achieve a predictive value in the vicinity of 0.56.

Combined evaluators

The obvious next step is to combine the individual predictors, using a weighted linear combination, to improve on the individual performance. Here are two random examples, after tuning to maximize the predictive value (but not for these particular games).

This is game 1990-meijin-league-1 in my collection, 
signature-a="mmeqnb", signature-b="jcqnhk"
Link to this game in Jansteen's database

This is game 1980-meijin-league-7 in my collection, 
signature-a="nbhqil" signature-b="pfoqkk"
Link to this game in Jansteen's database

Observations about the evaluators

As you can see from the graphs, the overal predictive value achieved is in the vicinity of 0.85, but the standard deviation is very wide, almost 0.2. Many of the moves are "right on", but there are obvious areas where the evaluator had not a clue.

My interpretation of this result is that in go, lots of moves are "automatic", and in these situations the predictors were fairly good at picking the "obvious" moves. In situations where the correct move is not "obvious", the predictors are very weak because they take no acount of specific knowlege about the position.

My conclusion is that much more specific knowlege about the positions has to be derived to be useful in generating high quality candidate moves.

comments/suggestions to: ddyer@real-me.net

my home page