posted 23 Jan 2024

Concentration Inequalities


§ Motivation

Random algorithms are random, so how can we become confident in the results of these algorithms? One way is to utilize concentration inequalities which enable us to quantify the likelihood of our answers deviating from the expected answer.

§ Markov's Inequality

Markov's Inequality

Let X0X \geq 0 be a nonnegative random variable, then for any t>0,t > 0,

Pr[XtE[X]]1t.\Pr[X \geq t \cdot \mathbb{E}[X]] \leq \frac{1}{t}.

 Markov's Inequality

E[X]=xΩPr[X=x]x=xtE[X]Pr[X=x]x+x<tE[x]Pr[X=x]xxtE[X]Pr[X=x]xtE[X]xtE[X]Pr[X=x]=tE[X]Pr[XtE[X]].\begin{align*} \mathbb{E}[X] &= \sum_{x \in \Omega} \Pr[X = x] \cdot x\\ &= \sum_{x\geq t\cdot\mathbb{E}[X]} \Pr[X = x] \cdot x + \sum_{x<t\cdot\mathbb{E}[x]}\Pr[X=x]\cdot x\\ &\geq \sum_{x\geq t\cdot\mathbb{E}[X]}\Pr[X=x]\cdot x\\ &\geq t\cdot\mathbb{E}[X]\sum_{x\geq t\cdot\mathbb{E}[X]}\Pr[X=x]\\ &=t\cdot\mathbb{E}[X]\cdot\Pr[X\geq t\cdot\mathbb{E}[X]]. \end{align*}

§ Chebyshev's Inequality

Chebyshev's Inequality

Let XX be a random variable with expected value μE[X]\mu \triangleq \mathbb{E}[X] and variance σ2Var[X],\sigma^2 \triangleq \text{Var}[X], then

Pr[Xμtσ2]1t2,\Pr[|X - \mu| \geq t\sigma^2] \leq \frac{1}{t^2},

or equivalently,

Pr[Xμt]σ2t2.\Pr[|X - \mu| \geq t] \leq \frac{\sigma^2}{t^2}.

 Chebyshev's Inequality

We can derive Chebyshev's through some by rearranging Markov's

Pr[XtE[X]]1t    Pr[Xt]E[X]t.\begin{align*} \Pr[X \geq t\cdot\mathbb{E}[X]] \leq \frac{1}{t} \iff \Pr[X \geq t] \leq \frac{\mathbb{E}[X]}{t}. \end{align*}

Then observing that Pr[Xt]    Pr[X2t2]\displaystyle{\Pr[|X| \geq t] \iff \Pr\left[X^2 \geq t^2\right]} gives

Pr[Xt]=PrX2t2E[X2]t2.\Pr[|X| \geq t] = \Pr{X^2 \geq t^2} \leq \frac{\mathbb{E}\left[X^2\right]}{t^2}.

Then substituting XXE[X],X \to X - \mathbb{E}[X],

Pr[XE[X]t]E[(XE[X])2]t2Var[X]t2.\begin{align*} \Pr[|X - \mathbb{E}[X]| \geq t] &\leq \frac{\mathbb{E}\left[(X - \mathbb{E}[X])^2\right]}{t^2}\\ &\leq \frac{\text{Var}[X]}{t^2}. \end{align*}

§ Bernstein's Inequality

Bernstein's Inequality

Let X1,X2,,Xn[M,M]X_1, X_2, \dots, X_n \in [-M, M] be independent random variables and let X=X1+X2++XnX = X_1 + X_2 + \cdots + X_n have mean μ\mu and variance σ2.\sigma^2. Then for any nonnegative t,t,

Pr[Xμt]2exp(t22σ2+43Mt).\Pr[|X - \mu| \geq t] \leq 2\exp\left(-\frac{t^2}{2\sigma^2 + \frac{4}{3}Mt}\right).

Proof is left to the reader ;).

§ Chernoff Bounds

These are a variant of Bernstein's inequality for when XX is a summation of binary random variables

Chernoff Bounds

Let X1,X2,,Xn{0,1}X_1, X_2, \dots, X_n \in \{0, 1\} be independent random variables and X=X1+X2++XnX = X_1 + X_2 + \cdots + X_n have mean μ.\mu. Then for any δ0,\delta \geq 0,

Pr[Xμδμ]2exp(δ2μ2+δ)\Pr[|X - \mu| \geq \delta\mu] \leq 2\exp\left(-\frac{\delta^2\mu}{2 + \delta}\right)

Another variation of Chernoff Bounds are for multiplicative error, i.e. within some (1±δ)(1 \pm \delta) bound of the mean.

Pr[X(1+δ)μ] 2exp(δ2μ2+δ)Pr[X(1δ)μ] exp(δ2μ2)Pr[Xδμ] 2exp(δ2μ3)\begin{alignat*}{3} \Pr[X \geq (1 + \delta)\mu] &\leq\ &2&\exp\left(-\frac{\delta^2\mu}{2 + \delta}\right)\\ \Pr[X \geq (1 - \delta)\mu] &\leq\ &&\exp\left(-\frac{\delta^2\mu}{2}\right)\\ \Pr[X \geq \delta\mu] &\leq\ &2&\exp\left(-\frac{\delta^2\mu}{3}\right) \end{alignat*}

§ Applications

§ Repeated Outcomes on a Die

Suppose we have a fair nn‑sided die. How many times should it be rolled before the probability we see a repeated outcome among the rolls is at least 0.5?0.5?

  • Θ(1)\Theta(1)
  • Θ(logn)\Theta(\log n)
  • Θ(n)\Theta\left(\sqrt{n}\right)
  • Θ(n)\Theta(n)

Answer: Θ(n)\Theta\left(\sqrt{n}\right)

The analysis of the above example begins by inspecting the probability that we do not see a repeated outcome for an nn‑sided die after kk rolls.

Let SiS_i be the event that the iith roll is a repeated outcome, conditioned on the fact that no previous rolls resulted in a repeated outcome, then its probability is given by

Pr[Si]=i1n\Pr[S_i] = \frac{i - 1}{n}

Then, let Sˉk\bar{S}_{\leq k} be the event we do not get a repeat after kk trials

Pr[Sˉk]=i=1k(1Pr[Si])=i=1k(1i1n).\Pr[\bar{S}_{\leq k}] = \prod_{i=1}^{k} \left(1 - \Pr[S_i]\right) = \prod_{i=1}^{k}\left(1 - \frac{i-1}{n}\right).

For k=nk = \sqrt{n} we have that Pr[Sˉk]<12,\Pr[\bar{S}_{\leq k}] < \displaystyle{\frac{1}{2}}, as 1Pr[Sk]=(1nn)=(11n)1 - \Pr[S_k] = \left(1 - \displaystyle{\frac{\sqrt{n}}{n}}\right) = \left(1 - \displaystyle{\frac{1}{\sqrt{n}}}\right) and 1n<12\displaystyle{\frac{1}{\sqrt{n}} < \frac{1}{2}} for n4.n \geq 4. As a result, we need O(n)\mathcal{O}(\sqrt{n}) trials to get a 0.50.5 probability of repeating an outcome.

The lower-bound can be shown by the union of the events SiS_i for i{1,,k}i \in \{1, \dots, k\}

Pr[S1Sk]0n++k1n\Pr[S_1 \cup \cdots \cup S_k] \leq \frac{0}{n} + \cdots + \frac{k-1}{n}

and by union bound we have that Pr[S1Sk]k2n.\displaystyle{\Pr[S_1 \cup \cdots \cup S_k] \leq \frac{k^2}{n}}. The probability none of these events occur is 1k2n;\displaystyle{\geq 1 - \frac{k^2}{n}}; then k=n2\displaystyle{k = \sqrt{\frac{n}{2}}} yields a probability 0.5\geq 0.5 of none of these events occurring. Hence, Ω(n)\Omega(\sqrt{n}) trials are needed to get a repeated outcome.

Therefore, we have kΘ(n)k \in \Theta(\sqrt{n}) to obtain a probability of 0.50.5 of seeing a repeated outcome using an nn-sided die.

Let XiX_i be the number of pairwise collisions on the iith roll, and we have E[Xi]=i1n.\displaystyle{\mathbb{E}[X_i] = \frac{i-1}{n}}. Let XX be the number of pairwise collisions after kk rolls, then

E[X]=E[X1+X2++Xk]=E[X1]+E[X2]++E[Xk]=0n+1n++k1n=k(k1)2n\begin{align*} \mathbb{E}[X] &= \mathbb{E}[X_1 + X_2 + \cdots + X_k]\\ &= \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_k]\\ &= \frac{0}{n} + \frac{1}{n} + \cdots + \frac{k-1}{n}\\ &= \frac{k(k-1)}{2n} \end{align*}

§ Word of the Day (Birthday Paradox)

Suppose we use an app to learn another language, every day they have a word of the day – and we believe it to be chosen uniformly at random.

We use the app for k=1000k = 1000 days and we keep track of any duplicates – i.e. collisions.

For n=1,000,000n=1,000,000 words, the number of duplicates we expect to see is about

E[X]=k(k1)2n=1000(999)21,000,000<0.5.\mathbb{E}[X] = \frac{k(k-1)}{2n} = \frac{1000(999)}{2 \cdot 1,000,000} < 0.5.

Suppose we saw 2020 duplicates, it is significantly more than the expected number; however, does it contradict our assumption about how the words are sampled?

Solution 5.2 (Markov's)

Using Markov's inequality, we have that

Pr[XtE[X]]1t\Pr[X \geq t\cdot\mathbb{E}[X]] \leq \frac{1}{t}

and we know E[X]<0.5\mathbb{E}[X] < 0.5 and we have observed 2020 duplicates, so

Pr[X20]140=0.025\Pr[X \geq 20] \leq \frac{1}{40} = 0.025

Markov's tells us that there is a small yet not insignificant likelihood of seeing 2020 duplicates given the setup of our problem. Whether this invalidates our hypothesis depends on our tolerance.

Note that Markov's only gives us an upper-bound, it is not the tightest upper-bound.

Solution 5.2 (Chebyshev's)

Chebyshev's requires us to compute the variance σ2.\sigma^2.

Var[Xi]=E[Xi2](E[Xi])2=i1n(i1n)2σ2=Var[X]=Var[X1]+Var[X2]++Var[Xk]=i=1k(i1n(i1n)2)=k(k1)(2k3n1)6n20.5\begin{align*} \text{Var}[X_i] &= \mathbb{E}\left[X_i^2\right] - (\mathbb{E}[X_i])^2\\ &= \frac{i-1}{n} - \left(\frac{i-1}{n}\right)^2\\ \sigma^2 = \text{Var}[X] &= \text{Var}[X_1] + \text{Var}[X_2] + \cdots + \text{Var}[X_k]\\ &= -\sum_{i=1}^{k} \left(\frac{i-1}{n} - \left(\frac{i-1}{n}\right)^2\right)\\ &= -\frac{k(k-1)(2k-3n-1)}{6n^2}\\ &\approxeq 0.5 \end{align*}

Chebyshev states that

Pr[Xμtσ]1t2.\Pr[|X - \mu| \geq t\sigma] \leq \frac{1}{t^2}.

Solving for t0,t \geq 0, we get t=Xμ/σ=19.52.t = |X - \mu| / \sigma = 19.5\sqrt{2}. Substituting into Chebyshev's yields

Pr[Xμ19.5](119.52)20.0013,\Pr[|X - \mu| \geq 19.5] \leq \left(\frac{1}{19.5\sqrt{2}}\right)^2 \approxeq 0.0013,

which is a whole order of magnitude lower in likelihood than Markov's; indicating that X=20X=20 is pretty unlikely.

Solution 5.2 (Chernoff)

Using μ=0.5\mu = 0.5 as computed in the previous solutions, we obtain δ\delta by computing δ=Xμ/μ=39.\delta = |X - \mu| / \mu = 39. Substituting into the Chernoff bound, we get that

Pr[Xμ19.5]2exp(392(0.5)2+0.5)1.5×10132.\Pr[|X - \mu| \geq 19.5] \leq 2\exp\left(-\frac{39^2(0.5)}{2 + 0.5}\right) \approxeq 1.5 \times 10^{-132}.

This tells us that it is effectively impossible to get 20 collisions when picking 1000 objects, with replacement and uniformly at random, from 1 million different objects.