Obtaining Dijkstra-like algorithms

For motion planning problems, it is expected that $ x(t + \Delta
t)$, as computed from (8.62), is always close to $ x(t)$ relative to the size of $ X$. This suggests the use of a Dijkstra-like algorithm to compute optimal feedback plans more efficiently. As discussed for the finite case in Section 2.3.3, many values remain unchanged during the value iterations, as indicated in Example 2.5. Dijkstra's algorithm maintains a data structure that focuses the computation on the part of the state space where values are changing. The same can be done for the continuous case by carefully considering the sample points [607].

During the value iterations, there are three kinds of sample points, just as in the discrete case (recall from Section 2.3.3):

  1. Dead: The cost-to-go has stabilized to its optimal value.
  2. Alive: The current cost-to-go is finite, but it is not yet known whether the value is optimal.
  3. Unvisited: The cost-to-go value remains at infinity because the sample has not been reached.
The sets are somewhat harder to maintain for the case of continuous state spaces because of the interaction between the sample set $ S$ and the interpolated region $ R(S)$.

Imagine the first value iteration. Initially, all points in $ G$ are set to zero values. Among the collection of samples $ S$, how many can reach $ R(G)$ in a single stage? We expect that samples very far from $ G$ will not be able to reach $ R(G)$; this keeps their values are infinity. The samples that are close to $ G$ should reach it. It would be convenient to prune away from consideration all samples that are too far from $ G$ to lower their value. In every iteration, we eliminate iterating over samples that are too far away from those already reached. It is also unnecessary to iterate over the dead samples because their values no longer change.

To keep track of reachable samples, it will be convenient to introduce the notion of a backprojection, which will be studied further in Section 10.1. For a single state, $ x \in X$, its backprojection is defined as

$\displaystyle \operatorname{B}(x) = \{x' \in X \;\vert\; \exists u' \in U(x')$    such that $\displaystyle x = f(x',u')\}.$ (8.66)

The backprojection of a set, $ X' \subseteq X$, of points is just the union of backprojections for each point:

$\displaystyle \operatorname{B}(X') = \bigcup_{x \in X'} \operatorname{B}(x) .$ (8.67)

Now consider a version of value iteration that uses backprojections to eliminate some states from consideration because it is known that their values cannot change. Let $ i$ refer to the number of stages considered by the current value iteration. During the first iteration, $ i=1$, which means that all one-stage trajectories are considered. Let $ S$ be the set of samples (assuming already that none lie in $ {\cal C}_{obs}$). Let $ D_i$ and $ A_i$ refer to the dead and alive samples, respectively. Initially, $ D_1 = G$, the set of samples in the goal set. The first set, $ A_1$, of alive samples is assigned by using the concept of a frontier. The frontier of a set $ S' \subseteq S$ of sample points is

$\displaystyle \operatorname{Front}(S') = (\operatorname{B}(R(S')) \setminus S') \cap S.$ (8.68)

This is the set of sample points that can reach $ R(S')$ in one stage, excluding those already in $ S'$. Figure 8.22 illustrates the frontier. Using (8.68), $ A_1$ is defined as $ A_1 =

Figure 8.22: An illustration of the frontier concept: (a) the shaded disc indicates the set of all reachable points in one stage, from the sample on the left. The sample cannot reach in one stage the shaded region on the right, which represents $ R(S')$. (b) The frontier is the set of samples that can reach $ R(S')$. The inclusion of the frontier increases the interpolation region beyond $ R(S')$.
...s/dismp4.eps,width=2.3in} \\
(a) & & (b)

Now the approach is described for iteration $ i$. The cost-to-go update (8.56) is computed at all points in $ A_i$. If $ G^*_{k+1}(s) = G^*_k(s)$ for some $ s \in A_i$, then $ s$ is declared dead and moved to $ D_{i+1}$. Samples are never removed from the dead set; therefore, all points in $ D_i$ are also added to $ D_{i+1}$. The next active set, $ A_{i+1}$, includes all samples in $ A_i$, excluding those that were moved to the dead set. Furthermore, all samples in $ \operatorname{Front}(A_i)$ are added to $ A_{i+1}$ because these will produce a finite cost-to-go value in the next iteration. The iterations continue as usual until some stage, $ m$, is reached for which $ A_m$ is empty, and $ D_m$ includes all samples from which the goal can be reached (under the approximation assumptions made in this section).

For efficiency purposes, an approximation to $ \operatorname{Front}$ may be used, provided that the true frontier is a proper subset of the approximate frontier. For example, the frontier might add all new samples within a specified radius of points in $ S'$. In this case, the updated cost-to-go value for some $ s \in A_i$ may remain infinite. If this occurs, it is of course not added to $ D_{i+1}$. Furthermore, it is deleted from $ A_i$ in the computation of the next frontier (the frontier should only be computed for samples that have finite cost-to-go values).

The approach considered so far can be expected to reduce the amount of computations in each value iteration by eliminating the evaluation of (8.56) on unnecessary samples. The same cost-to-go values are obtained in each iteration because only samples for which the value cannot change are eliminated in each iteration. The resulting algorithm still does not, however, resemble Dijkstra's algorithm because value iterations are performed over all of $ A_i$ in each stage.

To make a version that behaves like Dijkstra's algorithm, a queue $ Q$ will be introduced. The algorithm removes the smallest element of $ Q$ in each iteration. The interpolation version first assigns $ G^*(s) = 0$ for each $ s \in G$. It also maintains a set $ D$ of dead samples, which is initialized to $ D = G$. For each $ s \in
\operatorname{Front}(G)$, the cost-to-go update (8.56) is computed. The priority $ Q$ is initialized to $ \operatorname{Front}(G)$, and elements are sorted by their current cost-to-go values (which may not be optimal). The algorithm iteratively removes the smallest element from $ Q$ (because its optimal cost-to-go is known to be the current value) and terminates when $ Q$ is empty. Each time the smallest element, $ s_s \in Q$, is removed, it is inserted into $ D$. Two procedures are then performed: 1) Some elements in the queue need to have their cost-to-go values recomputed using (8.56) because the value $ G^*(s_s)$ is known to be optimal, and their values may depend on it. These are the samples in $ Q$ that lie in $ \operatorname{Front}(D)$ (in which $ D$ just got extended to include $ s_s$). 2) Any samples in $ \operatorname{B}(R(D))$ that are not in $ Q$ are inserted into $ Q$ after computing their cost-to-go values using (8.56). This enables the active set of samples to grow as the set of dead samples grows. Dijkstra's algorithm with interpolation does not compute values that are identical to those produced by value iterations because $ G^*_{k+1}$ is not explicitly stored when $ G^*_k$ is computed. Each computed value is some cost-to-go, but it is only known to be the optimal when the sample is placed into $ D$. It can be shown, however, that the method converges because computed values are no higher than what would have been computed in a value iteration. This is also the basis of dynamic programming using Gauss-Seidel iterations [96].

A specialized, wavefront-propagation version of the algorithm can be made for the special case of finding solutions that reach the goal in the smallest number of stages. The algorithm is similar to the one shown in Figure 8.4. It starts with an initial wavefront $ W_0 = G$ in which $ G^*(s) = 0$ for each $ s \in G$. In each iteration, the optimal cost-to-go value $ i$ is increased by one, and the wavefront, $ W_{i+1}$, is computed from $ W_i$ as $ W_{i+1} = \operatorname{Front}(W_i)$. The algorithm terminates at the first iteration in which the wavefront is empty.

Steven M LaValle 2012-04-20