Extending the formulation

Since a game under Formulation 9.7 can be nicely expressed as a matrix, it is tempting to use linear algebra to conveniently express expected costs. Let $ \vert U\vert = m$ and $ \vert V\vert = n$. As in Section 9.1.3, a randomized strategy for $ {{\rm P}_1}$ can be represented as an $ m$-dimensional vector,

$\displaystyle w = [w_1 \;\; w_2 \; \ldots \; w_m].$ (9.55)

The probability axioms of Section 9.1.2 must be satisfied: 1) $ w_i \geq 0$ for all $ i \in \{1,\ldots,m\}$, and 2) $ w_1
+ \cdots + w_m = 1$. If $ w$ is considered as a point in $ {\mathbb{R}}^m$, then the two constraints imply that it must lie on an $ (m-1)$-dimensional simplex (recall Section 6.3.1). If $ m = 3$, this means that $ w$ lies in a triangular subset of $ {\mathbb{R}}^3$. Similarly, let $ z$ represent a randomized strategy for $ {{\rm P}_2}$ as an $ n$-dimensional vector,

$\displaystyle z = [z_1 \;\; z_2 \; \ldots \; z_n]^T ,$ (9.56)

that also satisfies the probability axioms. In (9.56), $ ^T$ denotes transpose, which yields a column vector that satisfies the dimensional constraints required for an upcoming matrix multiplication.

Let $ {\bar{L}}(w,z)$ denote the expected cost that will be received if $ {{\rm P}_1}$ plays $ w$ and $ {{\rm P}_2}$ plays $ z$. This can be computed as

$\displaystyle {\bar{L}}(w,z) = \sum_{i = 1}^m \sum_{j = 1}^n L(i,j) w_i z_j .$ (9.57)

Note that the cost, $ L(i,j)$, makes use of the assumption in Formulation 9.7 that the actions are consecutive integers. The expected cost can be alternatively expressed using the cost matrix, $ A$. In this case

$\displaystyle {\bar{L}}(w,z) = wAz ,$ (9.58)

in which the product $ wAz$ yields a scalar value that is precisely (9.57). To see this, first consider the product $ Az$. This yields an $ m$-dimensional vector in which the $ i$th element is the expected cost that $ {{\rm P}_1}$ would receive if it tries $ u = i$. Thus, it appears that $ {{\rm P}_1}$ views $ {{\rm P}_2}$ as a nature player under the probabilistic model. Once $ w$ and $ Az$ are multiplied, a scalar value is obtained, which averages the costs in the vector $ Az$ according the probabilities of $ w$.

Let $ W$ and $ Z$ denote the set of all randomized strategies for $ {{\rm P}_1}$ and $ {{\rm P}_2}$, respectively. These spaces include strategies that are equivalent to the deterministic strategies considered in Section 9.3.2 by assigning probability one to a single action. Thus, $ W$ and $ Z$ can be considered as expansions of the set of possible strategies in comparison to what was available in the deterministic setting. Using $ W$ and $ Z$, randomized security strategies for $ {{\rm P}_1}$ and $ {{\rm P}_2}$ are defined as

$\displaystyle w^* = \operatornamewithlimits{argmin}_{w \in W} \Big\{ \max_{z \in Z} \Big\{ {\bar{L}}(w,z) \Big\} \Big\}$ (9.59)


$\displaystyle z^* = \operatornamewithlimits{argmax}_{z \in Z} \Big\{ \min_{w \in W} \Big\{ {\bar{L}}(w,z) \Big\} \Big\},$ (9.60)

respectively. These should be compared to (9.44) and (9.46). The differences are that the space of strategies has been expanded, and expected cost is now used.

The randomized upper value is defined as

$\displaystyle \overline{\cal L}^*= \max_{z \in Z} \Big\{ {\bar{L}}(w^*,z) \Big\} ,$ (9.61)

and the randomized lower value is

$\displaystyle \underline{\cal L}^*= \min_{w \in W} \Big\{ {\bar{L}}(w,z^*) \Big\} .$ (9.62)

Since $ W$ and $ Z$ include the deterministic security strategies, $ \overline{\cal L}^*\leq \overline{L}^*$ and $ \underline{\cal L}^*\geq \underline{L}^*$. These inequalities imply that the randomized security strategies may have some hope in closing the gap between the two values in general.

The most fundamental result in zero-sum game theory was shown by von Neumann [956,957], and it states that $ \underline{\cal L}^*=
\overline{\cal L}^*$ for any game in Formulation 9.7. This yields the randomized value $ {\cal L}^*= \underline{\cal L}^*= \overline{\cal L}^*$ for the game. This means that there will never be expected regret if the players stay with their security strategies. If the players apply their randomized security strategies, then a randomized saddle point is obtained. This saddle point cannot be seen as a simple pattern in the matrix $ A$ because it instead exists over $ W$ and $ Z$.

The guaranteed existence of a randomized saddle point is an important result because it demonstrates the value of randomization when making decisions against an intelligent opponent. In Example 9.7, it was intuitively argued that randomization seems to help when playing against an intelligent adversary. When playing the game repeatedly with a deterministic strategy, the other player could learn the strategy and win every time. Once a randomized strategy is used, the players will not experience regret.

Steven M LaValle 2012-04-20