Iterative Improvement | Brandon G. Nguyen

Motivation§

In iterative improvement, we seek to optimize the quality of states rather than optimizing path cost.

$N$-queens

The $N$-queens problem concerns itself with placing $N$ queens onto an $N \times N$ chess board such that no queens are able to capture another.

Fig. 1: Example solution to the 8-queens problem.

We may define the quality of the $N$-queens problem to be the number of queens that are in a position to capture another.

Hill Climbing§

In hill climbing (HC), we maintain a single current state. Using operators, we generate successors of the current state and pick the best.

function HillClimbing(problem) -> State {
    let current := problem.initial

    loop {
        let neighbour := max(problem.successors_of(current))

        if (value(neighbour) <= value(current))
            return current

        current := neighbour
    }
}

In the $N$-queens problem, the operator can be defined as moving a given queen to another row in the same column.

Issues of Hill Climbing§

Hill climbing is prone to:

local maxima,
the plateau effect,
and the ridge effect.

Plateau Effect

In a plateau, all neighbours of the current state have the same quality. As a result, there is no longer a gradient to climb.

Ridge Effect

All neighbours have the same or a lower score, but there may exist other nearby states of better quality.

Ridge Effect

Suppose we wished to perform HC over a vector space with more than one dimension. A natural choice of operators is to move along one of the basis vectors of the space.

When we encounter a ridge in HC, the ridge may have an increasing gradient along a direction that is a linear combination of bases. However, as we may only move along one basis at a time, we may encounter a situation where moving along any basis will lead to a lower quality state.

In HC, we seek to find the global maxima; however, the algorithm as currently defined has no method of differentiating between a global and local maxima. The other effects also make it difficult for HC to reach the global maxima.

Improvements to Hill Climbing§

Random Restart HC
Stochastic HC
- Choose any successor that is better than the current state, not necessarily the best successor.
- This idea gives rise to simulated annealing, which we will discuss later.
Remember some previous states
- As it stands, HC only maintains the current state, here we can reintroduce the idea of a “frontier” where we remember some expanded but unexplored nodes to back-track in the event that we reach a local maxima.
- This is the idea behind beam search, which is the topic of the next section.
Macro operators (macrops)
- New operators from combinations of other operators/actions to create more possible successor states.

Beam Search§

In beam search we aim to allow HC to have some form of backtracking should it encounter one of the aforementioned issues. To do so, we must keep track a number of previous states, not much unlike a frontier from greedy search. The primary difference between beam and greedy search is that in greedy, we order the priority queue with a heuristic $h(s);$ whereas in beam search, we use the state’s quality function $q(s).$

Again, we’re left with the issue of a (potentially) exponential frontier size. As a compromise, we can keep the $K$ previous best nodes, based on the quality $q(s).$ Beam search may still fail in some exceptional circumstances; however, this does allow the algorithm to perform some backtracking.

As we only remember a few nodes, we cannot explore the entire search space – thus may fail to find a global maximum; but it does it allow for the search to pursue higher and higher local maxima.

function BeamSearch(problem, K) -> State {
    // Beam is simply a priority queue that holds at must K elements
    // in a monotonically decreasing order.

    // Should we attempt to insert an element into a beam with K
    // elements already, we first compare the new element against
    // the tail, if it is less -- do nothing.
    // Otherwise, remove the tail, insert new element, reorder.

    let beam := Beam(contents: [ problem.initial ])

    loop {
        // beam.pop() pops the head element off the queue.
        let current := beam.pop()

        for child in problem.successor(state: current)
            beam.push(child)

        if beam.empty()
            return current
    }
}

Note: The lecture slides do not have BeamSearch() return anything. I’ve added the lines to return the current state if the beam is empty.
I’ven’t the time to check as I transcribe my notes, so I remain uncertain as to the correctness or even as to the algorithm terminating in the first place.

Simulated Annealing§

As mentioned earlier, another solution to HC’s problems was stochastic HC; wherein we choose, at random, any successor with $q(s)$ greater than the current node. This allows HC to do a random walk up a gradient; but this does not always yield the best results.

Again, in a ridge all our operators may lead us to a lower quality state, but that does not necessarily mean we have reached a global maxima.

Simulated Annealing

In simulated annealing (SA), we randomly choose a successor state.

If the state is better, accept.
Otherwise, we’re allowed to accept worse states – probablistically, proportionally to how much lower the quality is.
$$ Pr[\text{accepting a worse state}] = e^{-\Delta E / T}, $$
where
- $\Delta E$ is the difference between the quality of the current state and the current of the successor state, and
- $T$ is the temperature schedule.

The temperature scales the probability, generally we think of this as the “cooling of material” in the physical world. This is where simulated annealing gets its name: the annealing of metals. The higher the temperature, the less structured the atoms in a metal – making it more malleable. As the metal cools, the atoms become increasingly rigid as they become entrapped in a crystalline structure. With a cooler temperature, the less likely an atom is to move.

Generally, we wish to choose a function for our temperature schedule such that as time $t \to \infty, T \to 0$ – that is, in the limit no backwards movement is allowed. In the case that $T \to \infty$, we will always accept backward steps.

As long as the schedule approaches $0$ as time goes on, the choice of function does not matter; be it polynomial, piecewise, exponential, etc.

function SimulatedAnnealing(problem, schedule: Function -> float) -> State {
    let current := problem.initial()

    for let t := 1 to infinity {
        let T := schedule(t)

        if T = 0
            return current

        let next := problem.random_successor_of(current)
        let delta := problem.quality(next) - problem.quality(current)

        if delta > 0 {
            current := next
            continue
        }

        // Here, delta / T is always negative, as delta is always negative
        // If delta were positive, we'd have captured that in the above if-statement.
        let probability := exp(delta / T)

        // Suppose random chooses a floating point number in the unit interval, randomly.
        if random() <= probability:
            current := next
    }
}

Simulated Annealing and the Travelling Salesman§

As TSP is NP-hard there is not (that we know of) a polynomial time algorithm to find an optimal solution. However, with SA we can find approximately optimal solutions to the classic travelling salesman problem (TSP).

For a complete graph, the representation of TSP is a permutation of the nodes in the graph. Then, we have some choice for our operators, for example we may choose a random subsequence and swap its position, or revesrse the order of the subsequence. Another operator is simply picking two cities and swapping them.

Genetic Algorithms§

Also known as: Evolutionary Programming.

Genetic Algorithms

Genetic algorithms (GA) have the following properties:

They maintain a population of multiple candidate states (search in parallel, not in series),
states can be mixed-and-matched through recombination, and
use a measure of fitness of a state to select winners (like ’natural selection’).

Fitness

In GA, fitness is analogous to the state’s valuation or quality.

Some GAs involve themselves with chromosomes, in which states are represented via bit string. This method lends itself towards the recombination of states; but it is not a necessary approach as long as recombination is possible.

Chromosomal Expression in the $8$-queens Problem

We can express a singular state in $8$-queens using a list of 8 integers from 0 to 7. Here, each integer can be represented in 3-bits – thusly, the entire state in only 24 bits.

Suppose that we’re given a state $\mathfrak{F} = \{5, 1, 7, 2, 3, 6, 4, 0\}.$ That is, in the first row there exists a queen in the fifth column; a queen in the first column of the second row, etc.

In bitstring form,

$$ \mathfrak{F} = 101\ 001\ 111\ 010\ 011\ 110\ 100\ 000. $$

Then suppose we had a another state,

$$ \mathfrak{M} = 001\ 100\ 111\ 110\ 010\ 101\ 011\ 000. $$

Let $\star: S \times S \to S$ be a functor that does recombination of two states into a new, descendant state. Suppose that we chose a $\star$ such that it randomly splices the first bitstring at a random location $\beta,$ then concatenates the substring starting from $24 - \beta$ up to the end.

Then, one of many possible descendants of $\mathfrak{F}$ and $\mathfrak{M}$ can be given by

$$ \mathfrak{F} \star \mathfrak{M} = 001\ 100\ 111\ 110\ 011\ 110\ 100\ 000. $$

With chromosomal representation, we can splice the bitstrings at random locations like in the above example. For other representations, we must define a cross-over on how we can recombine the states.

In GA, we can randomly select parents from the population pool and generate a descendant using recombination. It is very likely that for some choices of parents, their children will take the best from both and be an improved state.

function GeneticAlgorithm(population: [State], fitness: Function -> float) -> State {
    loop {
        // Apply fitness function onto every state in the population to generate a parallel list
        let weights : [float] = population.map(fitness)
        let next_generation := []

        for i = 1 to population.size() {
            let parent1, parent2 := weighted_random_choice(population, weights, 2)
            let child := reproduce(parent1, parent2)

            let Pr := random() // Pr is random float in unit interval

            if Pr < epsilon // epsilon is probability of mutation, usually very small
                child := mutate(child)

            next_generation.push(child)
        }

        population := next_generation
    }
}

function weighted_random_choice(A: [T], B: [float], K: u64) -> [T] {
    return K random elements from list A using a list of weights, B.
}

function reproduce(parent1: State, parent2: State) -> State {
    let N := length(parent1)
    let beta := random_integer(1, N)

    let child := substring(parent1, 1, beta) + substring(parent2, beta + 1, N)

    return child
}

Mutation

In some GAs, they include mutation (included in pseudocode above). Simply, a mutation is a random change to a state, usually at a low frequency.

Discussion of Genetic Algorithms§

Lamarckian Evolution

Improvements/adaptations acquired during the lifetime of an individual can be passed onto offspring.

An issue of GA is that it is prone to a ’loss of diversity’, where the popluation becomes homogeneous in the limit.

Some applications of GAs include airfoil design and automatic program synthesis.

We can also seek to optimise GAs:

The power of GA comes from competition, not mutation.
Survival of the fittest drives the population, as a whole, to improvement.
As a result, we should aim to reduce the number of unfit individuals from reproducing – effectively dropping their traits.