So far it has been established that a randomized saddle point always exists, but how can one be found? Two key observations enable a combinatorial solution to the problem:

- The security strategy for each player can be found by considering only deterministic strategies for the opposing player.
- If the strategy for the other player is fixed, then the expected cost is a linear function of the undetermined probabilities.

First consider the problem of determining the security strategy for . The first observation means that (9.59) does not need to consider randomized strategies for . Inside of the , is fixed. What randomized strategy, , maximizes ? If is fixed, then can be treated as a constant -dimensional vector, . This means , in which is the inner (dot) product. Now the task is to select to maximize . This involves selecting the largest element of ; suppose this is . The maximum cost over all is obtained by placing all of the probability mass at action . Thus, the strategy and for gives the highest cost, and it is deterministic.

Using the first observation, for each , only possible responses by need to be considered. These are the deterministic strategies, each of which assigns for a unique .

Now consider the second observation. The expected cost,
, is a linear function of , if is fixed. Since only
needs to be fixed at different values due to the first
observation, is selected at the point at which the smallest
maximum value among the linear functions occurs. This is the
minimum value of the *upper envelope* of the collection of linear
functions. Such envelopes were mentioned in Section 6.5.2.
Example 9.16 will illustrate this. The domain for
this optimization can conveniently be set as a triangle in
. Even though
, the last coordinate,
, is not needed because it is always
. The resulting optimization falls under *linear
programming*, for which many combinatorial algorithms exist
[259,264,664,731].

In the explanation above, there is nothing particular to
when
trying to find its security strategy. The same method can be applied
to determine the security strategy for
; however, every
minimization is replaced by a maximization, and vice versa. In
summary, the in (9.60) needs only to consider
the deterministic strategies in . If becomes fixed, then
is once again a linear function, but this time it
is linear in . The best randomized action is chosen by finding the
point that gives the highest minimum value among linear
functions. This is the minimum value of the *lower envelope* of
the collection of linear functions. The optimization occurs over
because the last coordinate, , is obtained directly
from
.

This computation method is best understood through an example.

Consider computing the security strategy for . Note that and are only one-dimensional subsets of . A randomized strategy for is , with , , and . Therefore, the domain over which the optimization is performed is because can always be derived as . Using the first observation above, only the two deterministic strategies for need to be considered. When considered as linear functions of , these are

(9.64) |

for and

(9.65) |

for . The lines are plotted in Figure 9.4a. The security strategy is determined by the minimum point along the upper envelope shown in the figure. This is indicated by the thickened line, and it is always a piecewise-linear function in general. The lowest point occurs at , and the resulting value is . Therefore, .

A similar procedure can be used to obtain . The lines that
correspond to the deterministic strategies of
are shown in
Figure 9.4b. The security strategy is obtained by
finding the maximum value along the lower envelope of the lines, which
is shown as the thickened line in the figure. This results in
, and once again, the value is observed as
(this must coincide with the previous one because the randomized
upper and lower values are the same!).

This procedure appears quite simple if there are only two actions per player. If , then the upper and lower envelopes are piecewise-linear functions in . This may be computationally impractical because all existing linear programming algorithms have running time at least exponential in dimension [264].

Steven M LaValle 2012-04-20