11.7.2 Information Spaces for Games on State Spaces

I-space concepts can also be incorporated into sequential games that are played over state spaces. The resulting formulation naturally extends Formulation 11.1 of Section 11.1 to multiple players. Rather than starting with two players and generalizing later, the full generality of having $ n$ players is assumed up front. The focus in this section is primarily on characterizing I-spaces for such games, rather than solving them. Solution approaches depend heavily on the particular information models; therefore, they will not be covered here.

As in Section 11.7.1, each player has its own frame of reference and therefore its own I-space. The I-state for each player indicates its information regarding a common game state. This is the same state as introduced in Section 10.5; however, each player may have different observations and may not know the actions of others. Therefore, the I-state is different for each decision maker. In the case of perfect state sensing, these I-spaces all collapse to $ X$.

Suppose that there are $ n$ players. As presented in Section 10.5, each player has its own action space, $ U^i$; however, here it is not allowed to depend on $ x$, because the state may generally be unknown. It can depend, however, on the I-state. If nature actions may interfere with the state transition equation, then (10.120) is used (if there are two players); otherwise, (10.121) is used, which leads to predictable future states if the actions of all of the players are given. A single nature action, $ \theta \in
\Theta(x,u^1,u^2,\ldots,u^n)$, is used to model the effect of nature across all players when uncertainty in prediction exists.

Any of the sensor models from Section 11.1.1 may be defined in the case of multiple players. Each has its own observation space $ Y^i$ and sensor mapping $ h^i$. For each player, nature may interfere with observations through nature sensing actions, $ \Psi^i(x)$. A state-action sensor mapping appears as $ y^i =
h^i(x,\psi^i)$; state sensor mappings and history-based sensor mappings may also be defined.

Consider how the game appears to a single player at stage $ k$. What information might be available for making a decision? Each player produces the following in the most general case: 1) an initial condition, $ {\eta^i_0}$; 2) an action history, $ {\tilde{u}}^i_{k-1}$; and 3) and an observation history, $ {\tilde{y}}^i_k$. It must be specified whether one player knows the previous actions that have been applied by other players. It might even be possible for one player to receive the observations of other players. If $ {{\rm P}_i}$ receives all of this information, its history I-state at stage $ k$ is

$\displaystyle {\eta}^i_k = ({\eta^i_0},{\tilde{u}}^1_{k-1},{\tilde{u}}^2_{k-1},\ldots,{\tilde{u}}^n_{k-1}, {\tilde{y}}^1_k,{\tilde{y}}^2_k,...,{\tilde{y}}^n_k) .$ (11.84)

In most situations, however, $ {\eta}^i_k$ only includes a subset of the histories from (11.84). A typical situation is

$\displaystyle {\eta}^i_k = ({\eta^i_0},{\tilde{u}}^i_{k-1},{\tilde{y}}^i_k) ,$ (11.85)

which means that $ {{\rm P}_i}$ knows only its own actions and observations. Another possibility is that all players know all actions that have been applied, but they do not receive the observations of other players. This results in

$\displaystyle {\eta}^i_k = ({\eta^i_0},{\tilde{u}}^1_{k-1},{\tilde{u}}^2_{k-1},\ldots,{\tilde{u}}^n_{k-1},{\tilde{y}}^i_k) .$ (11.86)

Of course, many special cases may be defined by generalizing many of the examples in this chapter. For example, an intriguing sensorless game may be defined in which the history I-state consists only of actions. This could yield

$\displaystyle {\eta}^i_k = ({\eta^i_0},{\tilde{u}}^1_{k-1},{\tilde{u}}^2_{k-1},\ldots,{\tilde{u}}^n_{k-1}) ,$ (11.87)

or even a more secretive game in which the actions of other players are not known:

$\displaystyle {\eta}^i_k = ({\eta^i_0},{\tilde{u}}^i_{k-1}) .$ (11.88)

Once the I-state has been decided upon, a history I-space $ {\cal I}^i_{hist}$ for each player is defined as the set of all history I-states. In general, I-maps and derived I-spaces can be defined to yield alternative simplifications of each history I-space.

Assuming all spaces are finite, the concepts given so far can be organized into a sequential game formulation that is the imperfect state information counterpart of Formulation 10.4:

Formulation 11..4 (Sequential Game with I-Spaces)  
  1. A set of $ n$ players, $ {{\rm P}_1}$, $ {{\rm P}_2}$, $ \ldots $, $ {{\rm P}_n}$.
  2. A nonempty, finite state space $ X$.
  3. For each $ {{\rm P}_i}$, a finite action space $ U^i$. We also allow a more general definition, in which the set of available choices depends on the history I-state; this can be written as $ U^i({\eta}^i)$.
  4. A finite nature action space $ \Theta(x,u^1,\ldots,u^n)$ for each $ x \in X$, and $ u^i
\in U^i$ for each $ i$ such that $ 1 \leq i \leq m$.
  5. A state transition function $ f$ that produces a state, $ f(x,u^1,\ldots,u^n,\theta)$, for every $ x \in X$, $ \theta \in
\Theta(x,u)$, and $ u^i
\in U^i$ for each $ i$ such that $ 1 \leq i \leq n$.
  6. For each $ {{\rm P}_i}$, a finite observation space $ Y^i$.
  7. For each $ {{\rm P}_i}$, a finite nature sensing action space $ \Psi^i(x)$ for each $ x \in X$.
  8. For each $ {{\rm P}_i}$, a sensor mapping $ h^i$ which produces an observation, $ y = h^i(x,\psi^i)$, for each $ x \in X$ and $ \psi^i \in \Psi^i(x)$. This definition assumes a state-nature sensor mapping. A state sensor mapping or history-based sensor mapping, as defined in Section 11.1.1, may alternatively be used.
  9. A set of $ K$ stages, each denoted by $ k$, which begins at $ k=1$ and ends at $ k = K$. Let $ F = K+1$.
  10. For each $ {{\rm P}_i}$, an initial condition $ {\eta^i_0}$, which is an element of an initial condition space $ {{\cal I}^i_0}$.
  11. For each $ {{\rm P}_i}$, a history I-space $ {\cal I}^i_{hist}$ which is the set of all history I-states, formed from action and observation histories, and may include the histories of other players.
  12. For each $ {{\rm P}_i}$, let $ L^i$ denote a stage-additive cost functional,

    $\displaystyle L^i({\tilde{x}}_F,{\tilde{u}}^1_K,\ldots,{\tilde{u}}^2_K) = \sum_{k=1}^K l(x_k,u^1_k,\ldots,u^n_k) + l_F(x_F) .$ (11.89)

Extensions exist for cases in which one or more of the spaces are continuous; see [59]. It is also not difficult to add goal sets and termination conditions and allow the stages to run indefinitely.

An interesting specialization of Formulation 11.4 is when all players have identical cost functions. This is not equivalent to having a single player because the players have different I-states. For example, a task may be for several robots to search for a treasure, but they have limited communication between them. This results in different I-states. They would all like to cooperate, but they are unable to do so without knowing the state. Such problems fall under the subject of team theory [225,450,530].

As for the games considered in Formulation 10.4, each player has its own plan. Since the players do not necessarily know the state, the decisions are based on the I-state. The definitions of a deterministic plan, a globally randomized plan, and a locally randomized plan are essentially the same as in Section 11.7.1. The only difference is that more general I-spaces are defined in the current setting. Various kinds of solution concepts, such as saddle points and Nash equilibria, can be defined for the general game in Formulation 11.4. The existence of locally randomized saddle points and Nash equilibria depends on general on the particular information model [59].

Figure 11.32: In the Battleship game, each player places several ships on a grid. The other player must guess the locations of ships by asking whether a particular tile is occupied.

Example 11..27 (Battleship Game)   Many interesting I-spaces arise from classical board games. A brief illustration is provided here from Battleship, which is a sequential game under the alternating-turn model. Two players, $ {{\rm P}_1}$ and $ {{\rm P}_2}$, each having a collection of battleships that it arranges secretly on a $ 10 \times 10$ grid; see Figure 11.32.

A state is the specification of the exact location of all ships on each player's grid. The state space yields the set of all possible ship locations for both players. Each player always knows the location of its own ships. Once they are placed on the grid, they are never allowed to move.

The players take turns guessing a single grid tile, expressed as a row and column, that it suspects contains a ship. The possible observations are ``hit'' and ``miss,'' depending on whether a ship was at that location. In each turn, a single guess is made, and the players continue taking turns until one player has observed a hit for every tile that was occupied by a ship.

This is an interesting game because once a ``hit'' is discovered, it is clear that a player should search for other hits in the vicinity because there are going to be several contiguous tiles covered by the same ship. The only problem is that the precise ship position and orientation are unknown. A good player essentially uses the nondeterministic I-state to improve the chances that a hit will occur next. $ \blacksquare$

Example 11..28 (The Princess and the Monster)   This is a classic example from game theory that involves no sensing. A princess and a monster move about in a 2D environment. A simple motion model is assumed; for example, they take single steps on a grid. The princess is trying not to be discovered by the monster, and the game is played in complete darkness. The game ends when the monster and the princess are on the same grid point. There is no form of feedback that can be used during the game; however, it is possible to construct nondeterministic I-states for the players. For most environments, it is impossible for the monster to be guaranteed to win; however, for some environments it is guaranteed to succeed. This example can be considered as a special kind of pursuit-evasion game. A continuous-time pursuit-evasion game that involves I-spaces is covered in Section 12.4. $ \blacksquare$

Steven M LaValle 2012-04-20