## 10.1.1 Model Definition

The formulation presented in this section is an extension of Formulation 2.3 that incorporates the effects of nature at every stage. Let denote a discrete state space, and let denote the set of actions available to the decision maker (or robot) from state . At each stage it is assumed that a nature action is chosen from a set . This can be considered as a multi-stage generalization of Formulation 9.4, which introduced . Now may depend on the state in addition to the action because both and are available in the current setting. This implies that nature acts with the knowledge of the action selected by the decision maker. It is always assumed that during stage , the decision maker does not know the particular nature action that will be chosen. It does, however, know the set for all and .

As in Section 9.2, there are two alternative nature models: nondeterministic or probabilistic. If the nondeterministic model is used, then it is only known that nature will make a choice from . In this case, making decisions using worst-case analysis is appropriate.

If the probabilistic model is used, then a probability distribution over is specified as part of the model. The most important assumption to keep in mind for this case is that nature is Markovian. In general, this means that the probability depends only on local information. In most applications, this locality is with respect to time. In our formulation, it means that the distribution over depends only on information obtained at the current stage. In other settings, Markovian could mean a dependency on a small number of stages, or even a local dependency in terms of spatial relationships, as in a Markov random field [231,377].

To make the Markov assumption more precise, the state and action histories as defined in Section 8.2.1 will be used again here. Let

 (10.1)

and

 (10.2)

These represent all information that is available up to stage . Without the Markov assumption, it could be possible that the probability distribution for nature is conditioned on all of and , to obtain . The Markov assumption declares that for all ,

 (10.3)

which drops all history except the current state and action. Once these two are known, there is no extra information regarding the nature action that could be gained from any portion of the histories.

The effect of nature is defined in the state transition equation, which produces a new state, , once , , and are given:

 (10.4)

From the perspective of the decision maker, is not given. Therefore, it can only infer that a particular set of states will result from applying and :

 such that (10.5)

In (10.5), the notation indicates a set of possible values for , given and . The notation will generally be used to indicate the possible values for that can be derived using the information that appears in the argument.

In the probabilistic case, a probability distribution over can be derived for stage , under the application of from . As part of the problem, is given. Using the state transition equation, ,

 (10.6)

can be derived, in which

 (10.7)

The calculation of simply involves accumulating all of the probability mass that could lead to from the application of various nature actions.

Putting these parts of the model together and adding some of the components from Formulation 2.3, leads to the following formulation:

Formulation 10..1 (Discrete Planning with Nature)
1. A nonempty state space which is a finite or countably infinite set of states.
2. For each state, , a finite, nonempty action space . It is assumed that contains a special termination action, which has the same effect as the one defined in Formulation 2.3.
3. A finite, nonempty nature action space for each and .
4. A state transition function that produces a state, , for every , , and .
5. A set of stages, each denoted by , that begins at and continues indefinitely. Alternatively, there may be a fixed, maximum stage .
6. An initial state . For some problems, this may not be specified, in which case a solution plan must be found from all initial states.
7. A goal set .
8. A stage-additive cost functional . Let denote the history of nature actions up to stage . The cost functional may be applied to any combination of state, action, and nature histories to yield

 (10.8)

in which . If the termination action is applied at some stage , then for all , , , and .

Using Formulation 10.1, either a feasible or optimal planning problem can be defined. To obtain a feasible planning problem, let for all , , and . Furthermore, let

 (10.9)

To obtain an optimal planning problem, in general may assume any nonnegative, finite value if . For problems that involve probabilistic uncertainty, it is sometimes appropriate to assign a high, finite value for if , as opposed to assigning an infinite cost for failing to achieve the goal.

Note that in each stage, the cost term is generally allowed to depend on the nature action . If probabilistic uncertainty is used, then Formulation 10.1 is often referred to as a controlled Markov process or Markov decision process (MDP). If the actions are removed from the formulation, then it is simply referred to as a Markov process. In most statistical literature, the name Markov chain is used instead of Markov process when there are discrete stages (as opposed to continuous-time Markov processes). Thus, the terms controlled Markov chain and Markov decision chain may be preferable.

In some applications, it may be convenient to avoid the explicit characterization of nature. Suppose that . If nondeterministic uncertainty is used, then can be specified for all and as a substitute for the state transition equation; this avoids having to refer to nature. The application of an action from a state directly leads to a specified subset of . If probabilistic uncertainty is used, then can be directly defined as the alternative to the state transition equation. This yields a probability distribution over , if is applied from some , once again avoiding explicit reference to nature. Most of the time we will use a state transition equation that refers to nature; however, it is important to keep these alternatives in mind. They arise in many related books and research articles.

As used throughout Chapter 2, a directed state transition graph is sometimes convenient for expressing the planning problem. The same idea can be applied in the current setting. As in Section 2.1, is the vertex set; however, the edge definition must change to reflect nature. A directed edge exists from state to if there exists some and such that . A weighted graph can be made by associating the cost term with each edge. In the case of a probabilistic model, the probability of the transition occurring may also be associated with each edge.

Note that both the decision maker and nature are needed to determine which vertex will be reached. As the decision maker contemplates applying an action from the state , it sees that there may be several outgoing edges due to nature. If a different action is contemplated, then this set of possible outgoing edges changes. Once nature applies its action, then the particular edge is traversed to arrive at the new state; however, this is not completely controlled by the decision maker.

Example 10..1 (Traversing the Number Line)   Let , , and . The action sets of the decision maker and nature are the same for all states. For the state transition equation, . For each stage, unit cost is received. Hence for all , , and . The initial state is , and the goal set is .

Consider executing a sequence of actions, , under the nondeterministic uncertainty model. This means that we attempt to move left two units in each stage. After the first is applied, the set of possible next states is . Nature may slow down the progress to be only one unit per stage, or it may speed up the progress so that is three units closer per stage. Note that after stages, the goal is guaranteed to be achieved, in spite of any possible actions of nature. Once is reached, should be applied. If the problem is changed so that , it becomes impossible to guarantee that the goal will be reached because nature may cause the goal to be overshot.

Now let and . Under nondeterministic uncertainty, the problem can no longer be solved because nature is now powerful enough to move the state completely in the wrong direction in the worst case. A reasonable probabilistic version of the problem can, however, be defined and solved. Suppose that for each . The transition probabilities can be defined from . For example, if and , then if , and otherwise. With the probabilistic formulation, there is a nonzero probability that the goal, , will be reached, even though in the worst-case reaching the goal is not guaranteed.

Example 10..2 (Moving on a Grid)   A grid-based robot planning model can be made. A simple example is shown in Figure 10.1. The state space is a subset of a integer grid in the plane. A state is represented as , in which ; however, the points in the center region (shown in Figure 10.1) are not included in .

Let be a set of actions, which denote stay,'' right,'' up,'' left,'' and down,'' respectively. Let . For each , let contain and whichever actions are applicable from (some are not applicable along the boundaries).

Let represent the set of all actions in that are applicable after performing the move implied by . For example, if and , then the robot is attempting to move to . From this state, there are three neighboring states, each of which corresponds to an action of nature. Thus, in this case is . The action does not appear because there is no state to the left of . Suppose that the probabilistic model is used, and that every nature action is equally likely.

The state transition function is formed by adding the effect of both and . For example, if , , and , then . If had been , then the two actions would cancel and . Without nature, it would have been assumed that . As always, the state never changes once is applied, regardless of nature's actions.

For the cost functional, let (unless ; in this case, ). For the final stage, let if ; otherwise, let . A reasonable task is to get the robot to terminate in in the minimum expected number of stages. A feedback plan is needed, which will be introduced in Section 10.1.3, and the optimal plan for this problem can be efficiently computed using the methods of Section 10.2.1.

This example can be easily generalized to moving through a complicated labyrinth in two or more dimensions. If the grid resolution is high, then an approximation to motion planning is obtained. Rather than forcing motions in only four directions, it may be preferable to allow any direction. This case is covered in Section 10.6, which addresses planning in continuous state spaces.

Steven M LaValle 2012-04-20