Determining a security plan

The notion of a security strategy from Section 9.3.2 extends in a natural way to sequential games. This yields a security plan in which each player performs worst-case analysis by treating the other player as nature under nondeterministic uncertainty. A security plan and its resulting cost can be computed by propagating costs from the leaves up to the root. The computation of the security plan for $ {{\rm P}_1}$ for the game in Figure 10.13 is shown in Figure 10.14. The actions that would be chosen by $ {{\rm P}_2}$ are determined at all vertices in the second-to-last level of the tree. Since $ {{\rm P}_2}$ tries to maximize costs, the recorded costs at each of these vertices is the maximum over the costs of its children. At the next higher level, the actions that would be chosen by $ {{\rm P}_1}$ are determined. At each vertex, the minimum cost among its children is recorded. In the next level, $ {{\rm P}_2}$ is considered, and so on, until the root is reached. At this point, the lowest cost that $ {{\rm P}_1}$ could secure is known. This yields the upper value, $ \overline{L}^*$, for the sequential game. The security plan is defined by providing the action that selects the lowest cost child vertex, for each $ n_1 \in N_1$. If $ {{\rm P}_2}$ responds rationally to the security plan of $ {{\rm P}_1}$, then the path shown in bold in Figure 10.14d will be followed. The execution of $ {{\rm P}_1}$'s security plan yields the action sequence $ (L,L)$ for $ {{\rm P}_1}$ and $ (R,L)$ for $ {{\rm P}_2}$. The upper value is $ \overline{L}^*= 1$.

Figure 10.14: The security plan for $ {{\rm P}_1}$ is determined by propagating costs upward from the leaves. The choices involved in the security plan are shown in the last picture. An upper value of $ 1$ is obtained for the game.
...}$ chooses & (d) ${{\rm P}_1}$ chooses

Figure 10.15: The security plan can be found for $ {{\rm P}_2}$ by swapping the order of $ {{\rm P}_1}$ and $ {{\rm P}_2}$ (the order of the costs on the leaves also become reshuffled).

A security plan for $ {{\rm P}_2}$ can be computed similarly; however, the order of the decisions must be swapped. This is not easy to visualize, unless the order of the players is swapped in the tree. If $ {{\rm P}_2}$ acts first, then the resulting tree is as shown in Figure 10.15. The costs on the leaves appear in different order; however, for the same action sequences chosen by $ {{\rm P}_1}$ and $ {{\rm P}_2}$, the costs obtained at the end of the game are the same as those in Figure 10.14. The resulting lower value for the game is found to be $ \underline{L}^*= 1$. The resulting security plan is defined by assigning the action to each $ n_2 \in N_2$ that maximizes the cost value of its children. If $ {{\rm P}_1}$ responds rationally to the security plan of $ {{\rm P}_2}$, then the actions executed will be $ (L,L)$ for $ {{\rm P}_1}$ and $ (R,L)$ for $ {{\rm P}_2}$. Note that these are the same as those obtained from executing the security plan of $ {{\rm P}_1}$, even though they appear different in the trees because the player order was swapped. In many cases, however, different action sequences will be obtained.

As in the case of a single-stage game, $ \underline{L}^*= \overline{L}^*$ implies that the game has a deterministic saddle point and the value of the sequential game is $ L^*=
\underline{L}^*= \overline{L}^*$. This particular game has a unique, deterministic saddle point. This yields predictable, identical choices for the players, even though they perform separate, worst-case analyses.

A substantial reduction in the cost of computing the security strategies can be obtained by recognizing when certain parts of the tree do not need to be explored because they cannot yield improved costs. This idea is referred to as alpha-beta pruning in AI literature (see [839], pp. 186-187 for references and a brief history). Suppose that the tree is searched in depth-first order to determine the security strategy for $ {{\rm P}_1}$. At some decision vertex for $ {{\rm P}_1}$, suppose it has been determined that a cost $ c$ would be secured if a particular action, $ u$, is applied; however, there are still other actions for which it is not known what costs could be secured. Consider determining the cost that could be secured for one of these remaining actions, denoted by $ u'$. This requires computing how $ {{\rm P}_2}$ will maximize cost to respond to $ u'$. As soon as $ {{\rm P}_2}$ has at least one option for which the cost, $ c'$, is greater than $ c$, the other children of $ {{\rm P}_2}$ do not need to be explored. Why? This is because $ {{\rm P}_1}$ would never choose $ u'$ if $ {{\rm P}_2}$ could respond in a way that leads to a higher cost than what $ {{\rm P}_1}$ can already secure by choosing $ u$. Figure 10.16 shows a simple example. This situation can occur at any level in the tree, and when an action does not need to be considered, an entire subtree is eliminated. In other situations, children of $ {{\rm P}_1}$ can be eliminated because $ {{\rm P}_2}$ would not make a choice that allows $ {{\rm P}_1}$ to improve the cost below a value that $ {{\rm P}_2}$ can already secure for itself.

Figure 10.16: If the tree is explored in depth-first order, there are situations in which some children (and hence whole subtrees) do not need to be explored. This is an example that eliminates children of $ {{\rm P}_2}$. Another case exists, which eliminates children of $ {{\rm P}_1}$.

Steven M LaValle 2012-04-20