Planning in Games

Jussi Tella
Department of Computer Science and Engineering
Helsinki University of Technology
P.O. Box 1100, FIN-02015 HUT, Espoo, Finland
Jussi.Tella@hut.fi, http://www.cs.hut.fi/~jte/

In Seminar on Knowledge Engineering, Fall 1997, Helsinki University of Technology.


Abstract

Since early stages of computer game-playing, game-tree search and some very detailed extensions to it have been the main approach to game-playing. This approach differs decisively from human game playing, of which intuition and use of high-level game-specific knowledge is very typical. Some attempts to make computers play like humans have been made in chess, and actually with quite good results. However, this is true only in some particular parts of chess game, for example endgames with only few pieces left on the board. Now there are some new results of using this approach to bridge. A hierarchical task-network planner with multi-agent methods has been implemented to represent bridge knowledge as schemes, that are a very human-like way of representing different strategies. This paper aims to describe the way of human game-playing compared to computer approach, and this paper studies some techniques that have been tried to make computers play intelligently. Finally, this paper tries to study, why planning has worked well in bridge, what planning techniques have made this success possible, and could this planning approach be copied in some other games.


Contents


Human game-playing

Human strength: knowledge

Human thinking and game-playing has been studied in depth by psychologists, and there are some general articles comparing human approach to computer game-playing (George and Schaeffer, 1990 and Simon and Schaeffer, 1992). It has been established, that typical of master play is very modest search in computer terms: even best humans search only few moves per position, and human brain does this search quite slowly. The main strength of human is his capability to learn knowledge about games and use it efficiently. This knowledge contains some special patterns, or strategies as humans usually call them. These strategies form a multi-level hierarchy of things to seek in a certain position: some usual positional features (open lines in chess, some card combinations in bridge), or some usual strategic or tactical things to do in certain positions (placing one's rook to an open file or trying to force opponents to drop his or her high card's against our smaller ones). Some very common terms in human game-playing are pattern recognition, use of memory to save learned patterns, and use of intuition and experience.

Human planning

It is quite common to talk about "plan" and "planning" in human game-playing. Therefore, it is maybe useful to make a short explanation about those terms in human sense. As an example I choose the game of chess.

Plan

The word "plan" is quite often used very carelessly by chessplayers. Plan might be a short-term tactical action, lasting only few moves, or it might be a general strategic long-term idea of making use of some particular pattern on the chessboard. In chess these short-term plans quite often deal with chess pieces, and long-term strategies handle some lasting patterns in position, for example pawn formations or weakened pawn positions.

Planning

The word "planning" means almost the same as recognition of patterns available, and choice of the main strategy according to these patterns. Typical of human planning is very elastic evaluation and re-planning loop, because usually the number of possibly applicable plans is high, and these plans have quite strict relations to each other. So, in order to be able to cope with the situation well enough, master must have not only some general high-level plan, but also many short-term tactical plans, and he or she must after every move check the validity of these plans, and possibly change these plans to cope with the current situation on the chessboard.

Example

I present here some examples of human planning ideas. It might be useful to make some points clear, to use these human planning concepts clearly.





Figure 1: local pawn pattern "fianchettoed bishop"





Figure 2: tactical plan "exchange bishops to weaken king's position"





Figure 3: strategic plan "exchange pawns to open file for attack"

In Figures 1, 2 and 3 there are some common chess schemes. First one is an example of local pattern: black king has castled, and black bishop is guarding it. This is so called bishop fianchetto, a very common scheme even in some chess openings. In Figure 2 there is an example of tactical scheme, that tries to disturb black's king position. It is often quite useful to exchange the fianchettoed bishop, to weaken the black squares that are only covered by that bishop (squares f6 and h6, for example). This bishop exchange is a short-term plan, it usually requires only couple of moves. In Figure 3 there is an example of long term plan: opening a line for attack. Often, when the centre of the board is closed, the best way to attack black king behind fianchettoed bishop is to open the h-file for attack. The only way to do it is to push white h-pawn to direct contact with black pawns, which means to square h5. This is often quite long plan to carry out: white must have some support for the pawn available, for example rook that covers the whole h-file. This plan must be also considered quite exactly before doing it: if the global position is otherwise closed, this kind of plan may have some long-term consequences, that are not in favour of white.

Intelligent computer game-playing

Chess endgames: AL3

In the beginning of 1980's many different approaches to use knowledge in chess were tried. Typical of this research was the fact, that the resulting programs never played an entire game chess - they only dealt with some specific chess problems. The reason for this is explained later in chapter 4.

There are quite many papers describing slightly different chess programs using knowledge to play simplified endgame positions. I have chosen Bratko's paper (Bratko, 1983) as an example here, because his AL3 rule-based problem-solving system seems to be somehow related to this new and successful bridge program presented in section 3.3.

Advice language 3

Advice Language 3, or AL3, is a problem-solving system, whose idea is to represent knowledge for a chosen problem domain in so-called "pieces of advice". These pieces of advice can be regarded as plans of certain type, or preferably patterns and advice about what to do when recognising a certain pattern in position. These pieces of advice fit to planning theme as sub-goals. Because they are ordered according to priority, they implement a structure to achieve low priority sub-goals first, and after some sub-goals have been achieved, they are always preserved while some other higher priority goals are searched for.

Advice language concepts

The main concepts in AL3 are pieces-of-advice, plans, or-combinations of plans and modification of a plan by some other plans (actually expansion rules for the system).

The pieces-of-advice are tuples containing information about type of advice (attacking or defending side of that advice), predicates on position (certain features of position), and constraints on moves (some suggestions to good moves in that position, some constraints on move-order). These pieces-of-advice are stored in advice tables to maintain the priority order in them.

Plans describe the things to do in endgame positions. The format definition of plan contains information of goals to achieve, features to maintain in position, and some move constraints. Or-combinations of plans represent the co-ordination of plans: it is very common in chess, that plans have a strict influence on each other. The last concept, modification of plan by other plans, implements the expansion rules for plans.

Example

A pure planning example is the king and rook versus king endgame. There the following pieces of advice are needed (in descending order of priority):

This planning can be done without many tactical details. However, some things must be checked all the time: rook may not be exposed to defending king (either is must keep distance to opponent king, or it must be covered by it's own king), and stalemate must not be forced (while reducing the defenders space, the attacker must check, that there is always at least one possible move for the defender).

With this advice table, and these tactical checks, the planning problem is quite pure. There are not many disturbances, and the computer's play even resembles human play in the same ending.

The program was done in Prolog, and it is quite small in program code. In Prolog the structure for the advice table was easy to represent: the clauses were written in priority order, and when the features of the position were studied, and the variables containing the information of the position were instantiated, the first applicable piece of advice was chosen. It is also worth noticing, that some problems with the calculation times needed for a move were noticed already with these simple examples. It proves, that the problem when trying to play a whole game of chess, is very hard to solve with this approach.

Conclusions

Bratko implemented this system for king and one pawn versus king and one pawn endgame in chess. Later he has also implemented some more basic endgames with the advice language. Generally the methods, that implement the strategies for these endings, are quite logical, but there are some limitations. The set of endings programmable with advice language remains quite small: some pieces inserted to the defending side disturb the planning ideology a lot. That underlines the importance of initiative. The attacking player is the only player to do planning in an ending like this, thus the planning problem transforms to basic planning problem without any opponent.

Chess combinations: PARADISE

There are also many papers describing the use of knowledge in chess tactics or chess combinations. In this paper I present Wilkins' work (Wilkins, 1983), but also the other paper (Pitrat, 1977) is quite interesting, though a little older.

Chess combinations are also an interesting field for knowledge-oriented playing programs, because human grandmaster also relies mainly on learned high-level patterns while trying to solve complicated tactical middle-game positions. Wilkins tried to model this way of playing in PARADISE-program by production rules, that were designed to match patterns in chess position and produce moves or plans.

Ideas of PARADISE

The heart of PARADISE-system is knowledge base consisting of about 200 production rules. Every rule searches for instances of its pattern features, and when it finds an instance, it posts some higher-level concepts (actually features or suggested strategies that might work in that particular position) in the database used in reasoning process. These concepts are grouped according to abstract higher level knowledge sources, that correspond the human patterns quite closely.

The knowledge base is used in many ways in PARADISE. Some low level primitives generate some overall concepts about the features of the position. Then a consuming static analysis is done, to realise what attacking motifs are available. Then PARADISE produces plans to guide the search, and it executes these plans to focus the search on most important aspects of the position. Some special features, for example quiescence search, are used to realise the real value of a plan. Also analysis of problem upon failure is implemented, which means that knowledge of plans that fail are maintained also in lower levels of search tree. This means, that a human-like concept of trying to refute an attacking plan's refutation is possible.

Plans

In PARADISE plans are expressed in Plan-Language. Plan's try to focus program's attention on the critical part of a new position, and they reduce the branching factor by directing the search. A plan is either a particular move, or a goal. Short one-move plans are executed immediately, and goal plans are used when expanding and elaborating plans. A goal in a plan is an instance of a knowledge source, an its main purpose is to post new concepts to reasoning database.

Some important things about plans are their accuracy and the practical use of them. It is important to get the correct level of detail in plan, which means that the branching inside plans and instantiating method for defensive moves must be accurate. And when using plans to guide search, some effort must be invested in determining their value. Some human-like concepts, for example "progressively deepening search" are used in this process.

Implementation and conclusions

Wilkins tested PARADISE-system with a test set of chess combinations from a chess book containing tactical chess problems. The results he got were very good, but they demanded some detailed modification to production rules. But this modification proved, that the number of rules has an effect on performance: the more rules, the better the program plays. Wilkins states, that his ideas are most probably applicable to endgames, and possibly to strategic middle-game positions, too. But the biggest problem, the amount of work needed to code the knowledge to some rule formalism, has probably prevented the use of these ideas in any real chess programs.

This work by Wilkins has since been a starting point for knowledge-oriented game-playing programs, moreover the successful bridge program presented in next chapter rely on ideas tested here. As conclusion some general points about game-playing programs can made according to Wilkins' work. The importance of a plan becomes clear: it is better to have even a dubious plan guiding the search than to have no plan at all. Also it is possible to insert goal-orientation to game-playing with this approach: it is very useful for a program to know, what it is expecting to gain and why. And information gathered from failure and used later in child nodes of the game-tree is a very important improvement on traditional search approach.

Bridge: the success

The main motivation for this paper is the recent success of a bridge program using planning methods (Smith, Nau and Throop 1996). The new version of Bridge Baron program became the world champion of computer programs in July 1997 by winning a tournament hosted by the American Contract Bridge League. This program has been developed through a joint effort by University of Maryland professor Dana Nau, graduate student Stephen Smith, and head of a game-programming company Tom Throop. The work has been closely related to earlier planning research on task-network planning made by Nau's research group in University of Maryland.

The work presented in the next sections deals mainly with hierarchical task-network planning, where task decomposition is used in game-tree generation, and total-order task-networks represent the precise ordering constraints needed in game-playing domain. Some other planning concepts, for example planning under uncertainty, multi-agent planning and contingency planning are used. And as to game-playing field, this work is so far the most successful application of knowledge in any game-playing program, and also this work is a success in manipulating game-trees with uncertainty of imperfect information games.

Bridge characteristics

Bridge is the most popular card game, played all over the world. It is also very well studied game like chess, so there is much bridge literature describing the methods and strategies used by best human players. In a bridge game two two-person teams meet each other, and the players are named North and South, that play against East and West. A bridge game consists of not only play, but also an auction phase and scoring. In auction players make bids according to their cards, and highest bid gets the contract, that shapes the play. The player getting the contract is called declarer (in a sense "attacker"), and his partner becomes dummy, because his or her cards are shown to all other players. The opposite team ("defenders") tries to prevent the declarer to make the contract. And as to play, there are two types of games, that are trump and no-trump deals, where the trump deals are regarded as slightly more complicated than no-trump deals (Korpela 1997).

There are some bridge characteristics that affect planning. Bridge is imperfect-information game, so the whole world state is not known by any of the players. But there are some things that all players know: it is always only one player to move, it is known who is to move next, all players know what moves they can make, and all players know a finite set of possible move some other player might make.

Principles

The approach to represent bridge knowledge is done by schemes. Schemes are either short-term tactical motifs like finessing or ruffing, or long-term strategic motifs like crossruffing. These schemes resemble very much patterns used in knowledge-oriented approaches to chess. Schemes are compiled from bridge literature, and they are commonly used by human players.

The game-tree is represented by task-networks. Only parts that fit into schemes are represented. The task decomposition is equivalent to game-tree generation, and it is done by two kinds of multi-agent methods: operator methods and decomposable methods. The schemes are represented by these methods. Total-order task-networks are used, because game-playing domain requires more detailed instantiation than usual planning domain.

The actions, that different methods do, are represented in STRIPS-style, which means that they contain certain precondition, deleting and adding lists.

Planning procedure

The planning procedure contains decision-tree expansion, production of plans according to evaluation of that tree, and general execution and evaluation, possibly re-planning loop. The expansion of decision-tree is done by instantiated methods, that produce some new nodes to task networks according to their type. The decision-tree is then evaluated by a utility function called weighted-average-of-utilities, that is especially defined for imperfect-information games. It uses belief function as its source, and these beliefs are mainly based on knowledge gained from bidding phase or from prior play. The plan, that is followed, is actually the maximum utility branch in decision tree. The plan is in some way a contingency plan, because it tries to take care of all possibilities, all possible moves made by opponent. This plan is followed as long as possible. Of course it might happen, that an opponent move pushes us out of plan, or plan ends before the deal is played through to end, and then some re-planning is needed.

Implementation and results

The implementation of this planning approach to bridge is used in Bridge Baron -program, and it is already commercially available. The number of schemes used was about 100, and even with that number of schemes the number of moves searched in one deal was dropped to 500 000 from the high theoretical maximum, that is 10 to 44th power. The program was first tested with old version of the Bridge Baron, and it managed to win a duplicate bridge match clearly (in duplicate bridge all competitors play the same deals, so the performance can be easily compared). But since this test the program has won the computer world championship tournament, as mentioned earlier.

The main result of this work is the first success in applying knowledge-oriented approach to game playing. The use of schemes makes the computer play like the best human players, and it reduces the branching factor in search-tree decisively. Also, as a result of this paper, some questions are raised: why is this approach possible in bridge, can this planning approach be generalised to some other games, and are there some elements of planning in bridge?

An interesting point about this planning approach to bridge is worth noticing: the planning technique used here has been used to build a process-planning system for the manufacture of complex electro-mechanical devices. So again the result in one area can be generalised to some other applications, and again the time spent in game-playing seems to be useful in some real-life application.

Example

I think it might be useful to have an example of schemes and their use from bridge, too. The scheme "finesse" presented here can be generally translated to "trapping the enemy high card by the location of it". Consider the following deal (A, K, Q, J, T mean ace, king, queen, jack and ten respectively):

East is declarer with a bid of 2 no-trump and opening lead was made by south with Q of diamond. The realisation of finesse-scheme might happen as follows (# indicates the card that led the trick, and * indicates the card that won the trick):

West      North	    East       South	  Comments
diamond T diamond 4 diamond K* diamond Q# Getting control
heart 2	  heart T   heart Q#   heart A*   Setting up hearts
club 3	  diamond 3 diamond A* diamond J# Getting control
heart J*  spade 2   heart 8#   heart 9	  Setting up hearts, failure
spade A*# spade 5   spade 4    diamond 7  Setting up spades
spade T#  spade K   spade 6    diamond 5  Taking marked spade finesse

As can be seen, this lead of spade T leaves North in difficult situation: if he wants to take the trick, he must throw his spade K, which leaves East's spade Q and 9 clear winners. So, East and West have used a scheme called finesse to trap North high card, spade K.

One important question is how to choose the right plan? The planning system has information of the world in some pattern variables, that contain information of for example cards in the hand, cards played, cards that some player can not have, and so on. The plan is created according to these variables, for example: it is typical of the plan called finesse, that the attacking side has some card combination like A and Q, and that card combination is used to "trap" the enemy king. The finesse can of course be applied with many other card combinations as well, for example K and J against Q. The schemes and their pre-condition lists contain information of the instances for that scheme to be feasible.

Other games

The traditional game-tree search works well for "small" games like othello, checkers and tic-tac-toe, as quite many of games of this type have already been solved. Game is solved, when it can be known, which is the best strategy to follow from the initial position, and to which result this optimal strategy leads. For example, everyone knows that tic-tac-toe is a draw when both players play their best moves every time.

Chess has for a long time been regarded as a test field for Artificial Intelligence techniques, but it is not the most complicated of these classic strategy games. The game of Go has much larger branching factor, and mainly because of that the level of Go programs is quite low compared to even a novice play by human. The paper by Lehner (Lehner, 1983) presents a planning approach to Go, and it is presented in next section.

It should be mentioned, that it is quite surprising, that the first success for planning approach happens to come in imperfect-information game, bridge. Usually the uncertainty that must be dealt with while examining the possible moves is quite hard problem to solve, at least compared to these perfect information games. But perhaps these results with the bridge program raise the question, is there real difference in between these types of games. The computational limits seem to somehow reduce the gap between these types in practical programming.

Strategic planning in Go

Lehner's idea is to examine not the move-by-move game-trees, but to use a strategic look-ahead method called representative search. This method tries to examine sequences of play, and it does the search only when the sub-goal of certain plan has been reached. The main motivation for this approach is the fact, that in Go a plan can be formulated usually by perceptual characteristics of the position.

The concept of contingency planning, or contingency game-tree, appears also here. In this work the sequence of moves with plans they depend on is represented as contingency plan goal trees. With this tree both the strategic high-level knowledge and tactical short-term look-ahead can be implemented.

This paper proved, that it is possible to guide the search process also in Go. But as in chess, the strategic knowledge is not enough: you can only use this knowledge to choose the most critical part of the game-tree, in which to concentrate most search efforts. This program was tested with some commented Go-examples, and it got some good results. But it also proved, that in Go as well in chess the influence of plans on each other is very large, and it is a problem that must be solved somehow.

Conclusions

Planning techniques used

Following planning concepts deal with game-playing to some extent:

What are the reasons for success in bridge?

One reason for success of planning approach to bridge has been the ability to reduce the branching factor of an imperfect-information game radically. For a long time games of this type have been considered hard to solve with computers, but the way that knowledge has been used here, chances the picture. It seems that in bridge the number of schemes is quite low, the knowledge of different schemes is easily accessible, and it can be programmed in reasonable time and with a reasonable effort. It is also worth noticing, that in bridge it seems to be enough to plan sequences of moves, so there are no "tactical" elements to confuse these long-term plans. Especially useful this planning seems to be always, when a strategic scheme contains many forced moves.

The domain for bridge also seems to be quite appropriate for planning - human bridge play has natural element of planning. The domain is quite small: besides the low number of schemes, the number of possible opponent moves and the possible counter-measures for a plan are quite few. And also the influence of different plans to each other is small: it is possible to serialise plans, and instantiate the cards in feasible order according to these plans.

When will planning work?

A little comparison of chess and bridge

As the results of knowledge-oriented approaches to chess have not been the same as in bridge, we must consider what is the difference between these games. Of course, the branching factor in chess is higher, and so the game-trees become larger, and the reducing effect that the scheme-approach has is not so important. But that is not the most important difference.

More important seems to be the number of schemes. In chess also huge number of schemes, or patterns or strategies, have been described in literature. Some estimates of the number of schemes in chess have been made: there are claims (George and Schaeffer, 1990), that the number of schemes a master-player must know is at least 50000, or maybe even more.

These schemes also have some differences compared to bridge schemes: there seems to be schemes on many levels, from low-level "tactics" (like an attacking theme called "back rank mate") to higher-level strategies (like "king-side initiative"), and from local patterns (like "bishop fianchetto") to whole board patterns ("closed pawn centre"). And these schemes influence each other closely: schemes can transform to some other schemes ("opening a closed pawn centre"), schemes can refute other schemes without any direct connection to each other ("attack on different wings"), and player is often forced to use co-ordination of schemes explicitly ("principle of two weaknesses"). And very often a long-term strategic scheme is refuted by short-term tactics, that is in a way quite irrational compared to over-all positional features. Also one feature of schemes is interesting: in chess schemes are quite rarely serialisable. While it is possible to instantiate the playing order of cards early in bridge, the same does not work in chess: player usually has many plans at the same time, and the actual plan is chosen when the opponent commits himself to some defensive actions.

The classification of the plans and their interactions would be interesting. At least it would be interesting to know, if such classification can be made at all, or is the structure of the plans too complicated and fuzzy. This is possibly a problem for some psychological studies, but it might be interesting in planning sense, too. Possibly in many planning domains the plan classes and their interactions might provide some useful information for the planning algorithm as whole.

If that is not enough, there is one point more. The grandmaster chess is nowadays very complex: the plans, and counter-plans, and their refutations must be considered in every move exactly. You can follow some general plans, but you must all the time check what are your opponent's counter-plans to them. And if your opponent's plans seem to be too dangerous, you must counter them before they gain any advantage on the board. Of course, one important thing to study would be the importance of the initiative: when the attacker has the initiative completely, and the defender only reacts to direct threats, then even some more complicated positions may be transformed to pure planning problems. This relates to the basic human thinking "if I play that, he plays that, and so on". Only the right order of plans, or the attacking moves, must be found, to make it impossible for the defender to defend against all the threats.

When will planning work

It seems to be clearly so, that in order the planning approach to work, the domain must be quite "narrow". This means, that the number of schemes must be reasonable, or that the schemes must be easily programmable. This holds in bridge, but not in chess.

The interactions in playing domain must also be controllable: in bridge the number of schemes available in one deal is quite low, but in chess you always have tens of plans available - with uncontrollable consequences.

Literature