11.7.1 Information States in Game Trees

Recall from Section 10.5.1 that an important part of formulating a sequential game in a game tree is specifying the information model. This was described in Step 4 of Formulation 10.3. Three information models were considered in Section 10.5.1: alternating play, stage-by-stage, and open loop. These and many other information models can be described using I-spaces.

From Section 11.1, it should be clear that an I-space is always defined with respect to a state space. Even though Section 10.5.1 did not formally introduce a state space, it is not difficult to define one. Let the state space be , the set of all vertices in the game tree. Assume that two players are engaged in a sequential zero-sum game. Using notation from Section 10.5.1, and are the decision vertices of and , respectively. Consider the nondeterministic I-space over . Let denote a nondeterministic I-state; thus, each is a subset of .

There are now many possible ways in which the players can be confused while making their decisions. For example, if some contains vertices from both and , the player does not know whether it is even its turn to make a decision. If additionally contains some leaf vertices, the game may be finished without a player even being aware of it. Most game tree formulations avoid these strange situations. It is usually assumed that the players at least know when it is their turn to make a decision. It is also usually assumed that they know the stage of the game. This eliminates many sets from .

While playing the game, each player has its own nondeterministic I-state because the players may hide their decisions from each other. Let and denote the nondeterministic I-states for and , respectively. For each player, many sets in are eliminated. Some are removed to avoid the confusions mentioned above. We also impose the constraint that for and . We only care about the I-state of a player when it is that player's turn to make a decision. Thus, the nondeterministic I-state should tell us which decision vertices in are possible as faces a decision. Let and represent the nondeterministic I-spaces for and , respectively, with all impossible I-states eliminated.

The I-spaces and are usually defined directly on the game tree by circling vertices that belong to the same I-state. They form a partition of the vertices in each level of the tree (except the leaves). In fact, even forms a partition of for each player. Figure 11.30 shows four information models specified in this way for the example in Figure 10.13. The first three correspond directly to the models allowed in Section 10.5.1. In the alternating-play model, each player always knows the decision vertex. This corresponds to a case of perfect state information. In the stage-by-stage model, always knows the decision vertex; knows the decision vertex from which made its last decision, but it does not know which branch was chosen. The open-loop model represents the case that has the poorest information. Only knows its decision vertex at the beginning of the game. After that, there is no information about the actions chosen. In fact, the players cannot even remember their own previous actions. Figure 11.30d shows an information model that does not fit into any of the three previous ones. In this model, very strange behavior results. If and initially choose right branches, then the resulting decision vertex is known; however, if instead chooses the left branch, then will forget which action it applied (as if the action of caused to have amnesia!). Here is a single-stage example:

Plans are now defined directly as functions on the I-spaces. A *(deterministic) plan for
* is defined
as a function on
that yields an action
for each
, and
is the set of
actions that can be inferred from the I-state ; assume that
this set is the same for all decision vertices in .
Similarly, a *(deterministic) plan for
* is defined as a
function on
that yields an action
for each
.

There are generally two alternative ways to define a randomized plan
in terms of I-spaces. The first choice is to define a *globally
randomized plan*, which is a probability distribution over the set of
all deterministic plans. During execution, this means that an entire
deterministic plan will be sampled in advance according to the
probability distribution. An alternative is to sample actions as they
are needed at each I-state. This is defined as follows. For the
randomized case, let
and
denote the sets of
all probability distributions over
and
,
respectively. A *locally randomized plan for
* is defined as
a function that yields some
for each
. Likewise, a *locally
randomized plan for
* is a function that maps from
into
. Locally randomized plans expressed as functions of
I-states are often called *behavioral strategies* in game theory
literature.

A randomized saddle point on the space of locally randomized plans does not exist for all sequential games [59]. This is unfortunate because this form of randomization seems most natural for the way decisions are made during execution. At least for the stage-by-stage model, a randomized saddle point always exists on the space of locally randomized plans. For the open-loop model, randomized saddle points are only guaranteed to exist using a globally randomized plan (this was actually done in Section 10.5.1). To help understand the problem, suppose that the game tree is a balanced, binary tree with stages (hence, levels). For each player, there are possible deterministic plans. This means that probability values may be assigned independently (the last one is constrained to force them to sum to ) to define a globally randomized plan over the space of deterministic plans. Defining a locally randomized plan, there are I-states for each player, one for each search stage. At each stage, a probability distribution is defined over the action set, which contains only two elements. Thus, each of these distributions has only one independent parameter. A randomized plan is specified in this way using independent parameters. Since is much less than , there are many globally randomized plans that cannot be expressed as a locally randomized plan. Unfortunately, in some games the locally randomized representation removes the randomized saddle point.

This strange result arises mainly because players can forget
information over time. A player with *perfect recall* remembers
its own actions and also never forgets any information that it
previously knew. It was shown by Kuhn that the space of all
globally randomized plans is equivalent to the space of all locally
randomized plans if and only if the players have perfect memory
[562]. Thus, by sticking to games in which all players have
perfect recall, a randomized saddle point always exists in the space
locally randomized plans. The result of Kuhn even holds for the more
general case of the existence of randomized Nash equilibria on the space of locally randomized plans.

The nondeterministic I-states can be used in game trees that involve
more players. Accordingly, deterministic, globally randomized, and
locally randomized plans can be defined. The result of Kuhn applies
to any number of players, which ensures the existence of a randomized
Nash equilibrium on the space of locally randomized strategies if (and
only if) the players have perfect recall. It is generally preferable
to exploit this fact and decompose the game tree into smaller matrix
games, as described in Section 10.5.1. It turns out that
the precise condition that allows this is that it must be *ladder-nested* [59]. This means
that there are decision vertices, other than the root, at which 1)
the player that must make a decision knows it is at that vertex (the
nondeterministic I-state is a singleton set), and 2) the
nondeterministic I-state will not leave the subtree rooted at that
vertex (vertices outside of the subtree cannot be circled when
drawing the game tree). In this case, the game tree can be decomposed
at these special decision vertices and replaced with the game
value(s). Unfortunately, there is still the nuisance of multiple Nash
equilibria.

It may seem odd that nondeterministic I-states were defined without being derived from a history I-space. Without much difficulty, it is possible to define a sensing model that leads to the nondeterministic I-states used in this section. In many cases, the I-state can be expressed using only a subset of the action histories. Let and denote the action histories of and , respectively. The history I-state for the alternating-play model at stage is for and for . The history I-state for the stage-by-stage model is for both players. The nondeterministic I-states used in this section can be derived from these histories. For other models, such as the one in Figure 11.31, a sensing model is additionally needed because only partial information regarding some actions appears. This leads into the formulation covered in the next section, which involves both sensing models and a state space.

Steven M LaValle 2012-04-20