posted 23 Jan 2024

Success Boosting


§ Motivation

Suppose we have a randomized algorithm AA which outputs a value ZZ that is correct with probability 23.\displaystyle{\frac{2}{3}}.

Suppose for our application, we want to obtain the correct result with probability 0.999,0.999, or 11n2,1 - \displaystyle{\frac{1}{n^2}}, or (1δ).(1 - \delta).

§ Median of Means

To illustrate the usefulness of Chernoff bounds and other concentration inequalities, we can take the aforementioned algorithm AA. We run AA a total of O(log1δ)\displaystyle{\mathcal{O}\left(\log \frac{1}{\delta}\right)} times and take the median; with probability (1δ)(1 - \delta) it arrives at the correct answer.

§ Framework

Suppose we design a randomized algorithm AA to estimate a hidden statistic Θ\Theta of a dataset and we know in advance 0<Θ1000.0 < \Theta \leq 1000.

Suppose each time we use A,A, it outputs a number XX with E[X]=Θ\mathbb{E}[X] = \Theta and Var[X]=100Θ2.\text{Var}[X] = 100\,\Theta^2.

Suppose we want to estimate Θ\Theta within tolerance ε\varepsilon and with probability 1δ.1 - \delta.

  • Accuracy boosting: Repeat AA a total of 1012ε2\displaystyle{\frac{10^{12}}{\varepsilon^2}} times and take the mean
  • Success boosting: Find the mean of a total of O(log1δ)\mathcal{O}\left(\log \frac{1}{\delta}\right) trials and take the median to be correct with probability 1δ.1 - \delta.

§ Max Load

Suppose we have a fair nn‑sided die that is rolled nn times. On average, what is the largest number of times any outcome is rolled?

  • Θ(1)\Theta(1)
  • Θ˜(logn)\~{\Theta}(\log n)
  • Θ˜(n)\~{\Theta}(\sqrt{n})
  • Θ˜(n)\~{\Theta}(n)

Let k[n]k \in [n] be a fixed value and

Xi={1if i-th roll is k0otherwise.X_i = \begin{cases} 1 &\text{if $i$-th roll is $k$}\\ 0 &\text{otherwise.} \end{cases}

Then, E[Xi]=1n.\displaystyle{\mathbb{E}[X_i] = \frac{1}{n}}. The total number of rolls with value kk is X=i=1nXiX = \sum_{i=1}^{n}X_i and E[X]=1.\mathbb{E}[X] = 1. Using Chernoff bounds, we can let δ=3logn\delta = 3\log n and find that

Pr[X3logn]1n2\Pr[X \geq 3\log n] \leq \frac{1}{n^2}

Using union bound, we have at least a (11n)\displaystyle{\left(1 - \frac{1}{n}\right)} probability that no outcome will be rolled more than 3logn3 \log n times.

§ Coupon Collector

Suppose we have a fair nn‑sided die. On average, how many times should we roll the die before we see all possible outcomes among the rolls?

  • Θ(n)\Theta(n)
  • Θ(nlogn)\Theta(n \log n)
  • Θ(nn)\Theta(n \sqrt{n})
  • Θ(n2)\Theta\left(n^2\right)