posted 23 Jan 2024

Concentration Inequalities

Notes Randomized Algorithms Concentration Inequalities

§ Motivation

Random algorithms are random, so how can we become confident in the results of these algorithms? One way is to utilize concentration inequalities which enable us to quantify the likelihood of our answers deviating from the expected answer.

§ Markov's Inequality

Markov's Inequality

Let $X \geq 0$ be a nonnegative random variable, then for any $t > 0,$

\Pr[X \geq t \cdot \mathbb{E}[X]] \leq \frac{1}{t}.

Markov's Inequality

\begin{align*} \mathbb{E}[X] &= \sum_{x \in \Omega} \Pr[X = x] \cdot x\\ &= \sum_{x\geq t\cdot\mathbb{E}[X]} \Pr[X = x] \cdot x + \sum_{x<t\cdot\mathbb{E}[x]}\Pr[X=x]\cdot x\\ &\geq \sum_{x\geq t\cdot\mathbb{E}[X]}\Pr[X=x]\cdot x\\ &\geq t\cdot\mathbb{E}[X]\sum_{x\geq t\cdot\mathbb{E}[X]}\Pr[X=x]\\ &=t\cdot\mathbb{E}[X]\cdot\Pr[X\geq t\cdot\mathbb{E}[X]]. \end{align*}

§ Chebyshev's Inequality

Chebyshev's Inequality

Let $X$ be a random variable with expected value $\mu \triangleq \mathbb{E}[X]$ and variance $\sigma^2 \triangleq \text{Var}[X],$ then

\Pr[|X - \mu| \geq t\sigma^2] \leq \frac{1}{t^2},

or equivalently,

\Pr[|X - \mu| \geq t] \leq \frac{\sigma^2}{t^2}.

Chebyshev's Inequality

We can derive Chebyshev's through some by rearranging Markov's

\begin{align*} \Pr[X \geq t\cdot\mathbb{E}[X]] \leq \frac{1}{t} \iff \Pr[X \geq t] \leq \frac{\mathbb{E}[X]}{t}. \end{align*}

Then observing that $\displaystyle{\Pr[|X| \geq t] \iff \Pr\left[X^2 \geq t^2\right]}$ gives

\Pr[|X| \geq t] = \Pr{X^2 \geq t^2} \leq \frac{\mathbb{E}\left[X^2\right]}{t^2}.

Then substituting $X \to X - \mathbb{E}[X],$

\begin{align*} \Pr[|X - \mathbb{E}[X]| \geq t] &\leq \frac{\mathbb{E}\left[(X - \mathbb{E}[X])^2\right]}{t^2}\\ &\leq \frac{\text{Var}[X]}{t^2}. \end{align*}

§ Bernstein's Inequality

Bernstein's Inequality

Let $X_1, X_2, \dots, X_n \in [-M, M]$ be independent random variables and let $X = X_1 + X_2 + \cdots + X_n$ have mean $\mu$ and variance $\sigma^2.$ Then for any nonnegative $t,$

\Pr[|X - \mu| \geq t] \leq 2\exp\left(-\frac{t^2}{2\sigma^2 + \frac{4}{3}Mt}\right).

Proof is left to the reader ;).

§ Chernoff Bounds

These are a variant of Bernstein's inequality for when $X$ is a summation of binary random variables

Chernoff Bounds

Let $X_1, X_2, \dots, X_n \in \{0, 1\}$ be independent random variables and $X = X_1 + X_2 + \cdots + X_n$ have mean $\mu.$ Then for any $\delta \geq 0,$

\Pr[|X - \mu| \geq \delta\mu] \leq 2\exp\left(-\frac{\delta^2\mu}{2 + \delta}\right)

Another variation of Chernoff Bounds are for multiplicative error, i.e. within some $(1 \pm \delta)$ bound of the mean.

\begin{alignat*}{3} \Pr[X \geq (1 + \delta)\mu] &\leq\ &2&\exp\left(-\frac{\delta^2\mu}{2 + \delta}\right)\\ \Pr[X \geq (1 - \delta)\mu] &\leq\ &&\exp\left(-\frac{\delta^2\mu}{2}\right)\\ \Pr[X \geq \delta\mu] &\leq\ &2&\exp\left(-\frac{\delta^2\mu}{3}\right) \end{alignat*}

§ Applications

§ Repeated Outcomes on a Die

Suppose we have a fair $n$ ‑sided die. How many times should it be rolled before the probability we see a repeated outcome among the rolls is at least $0.5?$

$\Theta(1)$
$\Theta(\log n)$
$\Theta\left(\sqrt{n}\right)$
$\Theta(n)$

Answer: $\Theta\left(\sqrt{n}\right)$

The analysis of the above example begins by inspecting the probability that we do not see a repeated outcome for an $n$ ‑sided die after $k$ rolls.

Let $S_i$ be the event that the $i$ ^th roll is a repeated outcome, conditioned on the fact that no previous rolls resulted in a repeated outcome, then its probability is given by

\Pr[S_i] = \frac{i - 1}{n}

Then, let $\bar{S}_{\leq k}$ be the event we do not get a repeat after $k$ trials

\Pr[\bar{S}_{\leq k}] = \prod_{i=1}^{k} \left(1 - \Pr[S_i]\right) = \prod_{i=1}^{k}\left(1 - \frac{i-1}{n}\right).

For $k = \sqrt{n}$ we have that $\Pr[\bar{S}_{\leq k}] < \displaystyle{\frac{1}{2}},$ as $1 - \Pr[S_k] = \left(1 - \displaystyle{\frac{\sqrt{n}}{n}}\right) = \left(1 - \displaystyle{\frac{1}{\sqrt{n}}}\right)$ and $\displaystyle{\frac{1}{\sqrt{n}} < \frac{1}{2}}$ for $n \geq 4.$ As a result, we need $\mathcal{O}(\sqrt{n})$ trials to get a $0.5$ probability of repeating an outcome.

The lower-bound can be shown by the union of the events $S_i$ for $i \in \{1, \dots, k\}$

\Pr[S_1 \cup \cdots \cup S_k] \leq \frac{0}{n} + \cdots + \frac{k-1}{n}

and by union bound we have that $\displaystyle{\Pr[S_1 \cup \cdots \cup S_k] \leq \frac{k^2}{n}}.$ The probability none of these events occur is $\displaystyle{\geq 1 - \frac{k^2}{n}};$ then $\displaystyle{k = \sqrt{\frac{n}{2}}}$ yields a probability $\geq 0.5$ of none of these events occurring. Hence, $\Omega(\sqrt{n})$ trials are needed to get a repeated outcome.

Therefore, we have $k \in \Theta(\sqrt{n})$ to obtain a probability of $0.5$ of seeing a repeated outcome using an $n$ -sided die.

Let $X_i$ be the number of pairwise collisions on the $i$ ^th roll, and we have $\displaystyle{\mathbb{E}[X_i] = \frac{i-1}{n}}.$ Let $X$ be the number of pairwise collisions after $k$ rolls, then

\begin{align*} \mathbb{E}[X] &= \mathbb{E}[X_1 + X_2 + \cdots + X_k]\\ &= \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_k]\\ &= \frac{0}{n} + \frac{1}{n} + \cdots + \frac{k-1}{n}\\ &= \frac{k(k-1)}{2n} \end{align*}

§ Word of the Day (Birthday Paradox)

Suppose we use an app to learn another language, every day they have a word of the day – and we believe it to be chosen uniformly at random.

We use the app for $k = 1000$ days and we keep track of any duplicates – i.e. collisions.

For $n=1,000,000$ words, the number of duplicates we expect to see is about

\mathbb{E}[X] = \frac{k(k-1)}{2n} = \frac{1000(999)}{2 \cdot 1,000,000} < 0.5.

Suppose we saw $20$ duplicates, it is significantly more than the expected number; however, does it contradict our assumption about how the words are sampled?

Solution 5.2 (Markov's)

Using Markov's inequality, we have that

\Pr[X \geq t\cdot\mathbb{E}[X]] \leq \frac{1}{t}

and we know $\mathbb{E}[X] < 0.5$ and we have observed $20$ duplicates, so

\Pr[X \geq 20] \leq \frac{1}{40} = 0.025

Markov's tells us that there is a small yet not insignificant likelihood of seeing $20$ duplicates given the setup of our problem. Whether this invalidates our hypothesis depends on our tolerance.

Note that Markov's only gives us an upper-bound, it is not the tightest upper-bound.

Solution 5.2 (Chebyshev's)

Chebyshev's requires us to compute the variance $\sigma^2.$

\begin{align*} \text{Var}[X_i] &= \mathbb{E}\left[X_i^2\right] - (\mathbb{E}[X_i])^2\\ &= \frac{i-1}{n} - \left(\frac{i-1}{n}\right)^2\\ \sigma^2 = \text{Var}[X] &= \text{Var}[X_1] + \text{Var}[X_2] + \cdots + \text{Var}[X_k]\\ &= -\sum_{i=1}^{k} \left(\frac{i-1}{n} - \left(\frac{i-1}{n}\right)^2\right)\\ &= -\frac{k(k-1)(2k-3n-1)}{6n^2}\\ &\approxeq 0.5 \end{align*}

Chebyshev states that

\Pr[|X - \mu| \geq t\sigma] \leq \frac{1}{t^2}.

Solving for $t \geq 0,$ we get $t = |X - \mu| / \sigma = 19.5\sqrt{2}.$ Substituting into Chebyshev's yields

\Pr[|X - \mu| \geq 19.5] \leq \left(\frac{1}{19.5\sqrt{2}}\right)^2 \approxeq 0.0013,

which is a whole order of magnitude lower in likelihood than Markov's; indicating that $X=20$ is pretty unlikely.

Solution 5.2 (Chernoff)

Using $\mu = 0.5$ as computed in the previous solutions, we obtain $\delta$ by computing $\delta = |X - \mu| / \mu = 39.$ Substituting into the Chernoff bound, we get that

\Pr[|X - \mu| \geq 19.5] \leq 2\exp\left(-\frac{39^2(0.5)}{2 + 0.5}\right) \approxeq 1.5 \times 10^{-132}.

This tells us that it is effectively impossible to get 20 collisions when picking 1000 objects, with replacement and uniformly at random, from 1 million different objects.