Motivation§

Many AI search problems have search spaces that are exponential in size, then our main focus is to use domain knowledge to improve search efficiency.

Domain Knowledge

Domain knowledge refers to anything pertinent to solving a problem.

Examples of knowledge: rules of thumb, common solutions, problem decomposition into smaller subgoals, operator interaction/dependencies, etc.


Heuristic Functions§

Heuristic Function

In our context, domain knowledge is given by a heuristic function $h: S \to \mathbb{R}.$

Ideally, $h(s)$ is an estimate of the path cost remaining from a state $s$ to the closest goal.

Generally, $h(s) \geq 0$ and $\forall g \in \mathcal{G}, h(g) = 0.$

By using a heuristic, we can guide the search away from unproductive areas of the search space.

Navigation

ArZaMDedeorLhbiuarngdedToitijaamiOsroaadreaaCraiSoiRvbialmuni[cPgueotaVelis]ltcieaBFuacGghiaaurrraegssituUNrezaimcteInaisiVaslEufior[igHeoiarls]ova

Suppose we again want to navigate from Sibiu to Vaslui.

  • Using BFS (FIFO) the frontier at each pass is given by:
    • S | O, A, R, F | Z, T, C, P, B | L, D, U, G | M, H, V.
  • Using DFS (LIFO), the sequence of states we visit:
    • S, O, Z, A, T, L, M, D, C, R, P, B, F, G, U, H, E, V.

Now, consider the heuristic of the straight line distance, i.e. we use a priority queue and order them based on the straight line distance between a node and the goal.

  • Using UC (PQ), the sequence of states we visit:
    • S, F, B, U, V

Tile Puzzle

758Init23ial46136Goa147l258

A simple heuristic for the tile puzzle would be the Hamming distance, or the number of disagreements between a given state and the goal.

  • This is an underestimate as it will take more moves than the number of tiles out of place to get each tile in the proper place.
  • While the heuristic can help differentiate between almost-solved and no-where-close states, it has no guarantee for “closeness”, i.e. $h(s) = 1$ conveys that only one tile is out of place, but it does not convey with it the number of possible moves.

Another heuristic is the Manhattan distance.

  • This method can underestimate the required number of moves, as the process of moving a tile can cause other tiles to be displaced from the correct position.
  • Likewise, this method can also overestimate the number of moves — moving a tile can possibly move multiple tiles into place.

Taking the pseudocode for BestFirstSearch from Uninformed Search

function BestFirstSearch(problem, f: Function -> bool) -> Node? {
    let node := Node(state: problem.initial)
    let frontier := PriorityQueue(ordering: f, contents: [ node ])
    let visited := HashTable([ problem.initial: node ])

    while not frontier.empty() {
        let node := frontier.pop()

        if problem.goal_test(state: node.state)
            return node

        for child in problem.expand(node) {
            let successor := child.state

            if not visited.contains(successor) or child.cost < visited[successor].cost {
                visited[successor] := child
                frontier.push(child)
            }
        }
    }

    return None
}

Then, by passing $h(s)$ as the functor f in the BestFirstSearch algorithm, we order the priority queue in accordance to our heuristic.

Garden Path Problem

With greedy search, the heuristic can cause the algorithm to begin searching and goal testing nodes that do not belong to the minimal cost path.

Take the map from the navigation example, suppose we start from Vaslui and want to find a path to Oradea. The straight line distance heuristic may cause us to first check the path from Vaslui to Iasi and finally to Neamt, however, by just looking at the map we know that the path to Neamt is a dead end.


A* Algorithm§

Like greedy search, A* builds off of BestFirstSearch, except instead of passing the heuristic $h(s)$ as the function $f(s),$ we let $f(s) = g(s) + h(s),$ where $g(s)$ is the path-cost thus far and $h(s)$ is the heuristic — except it is an admissible heuristic.

Admissible Heuristic

An admissible heuristic is a heuristic that never overestimates the true distance to a goal, and

$$ \forall\,s\in S, 0 \leq h(s) \leq C^{\star}(s). $$

While the heuristic may be inaccurate, having an admissible heuristic guarantees optimality of A*.

However, using a nonadmissible heuristic in A* can make the search more efficient, i.e. it may become faster at finding a goal node but there are no guarantees on it being the minimal cost goal.

Theorem 1

A* finds a goal with minimum path cost (A* is optimal).

Suppose the optimal goal is $g^{\star}$ but the algorithm instead returns $g$, where $\text{cost}(g) > \text{cost}(g^{\star}).$

Let $n^{\star}$ be a node in the frontier (priority queue) that also lies on the optimal path to $g^{\star}.$

Assuming that we use an admissible heuristic, $$ f(n^{\star}) = g(n^{\star}) + h(n^{\star}) \leq \text{cost}(n_0, \dots, n^{\star}) + \text{cost}(n^{\star}, \dots, g^{\star}) = \text{cost}(n_0, \dots, g^{\star}) = \text{cost}(g^{\star}). $$

Contradiction. Therefore, $n^{\star}$ as well as all other nodes along the minimum cost path to $g^{\star}$ will be dequeued before $g.$

Lemma 1

$f(s)$ scores monotonically increase along any path from the root.

Theorem 2

A* explores states in order of increasing $f(s).$

A* Analysis§

Generally, the more accurate the heuristic, the faster the search.

  • For the boundary case of $h(s) = 0,$ A* degenerates into uniform cost and has a time complexity of $L_b\left[1, 1 + \lfloor C^{\star}/\varepsilon\rfloor\right].$
  • For a perfect heuristic $h(s) = c(s),$ A* will be led directly to the goal and have a time complexity that is linear in the path length.

For a heuristic with a bounded inaccuracy, A* will be subexponential.

First, we define “relative error” to be

$$ \Delta = \max\,\left|\frac{h - h^{\star}}{h^{\star}}\right|, $$

where $h^{\star}$ is the optimal heuristic.

Then, A* has a time complexity of

$$ L_b\left[1, \Delta\cdot L(g)\right], $$

where $L(g)$ is the path length to the goal $g.$

If $|h - h^{\star}| \in \mathcal{O}(\log h^{\star}),$ then A* will search a subexponential number of nodes before finding the optimal goal — however, in practice this is rarely attainable.