Random algorithms are random, so how can we become confident in the results of these algorithms?
One way is to utilize concentration inequalities which enable us to quantify the likelihood of our answers deviating from the expected answer.
Suppose we have a fair n‑sided die. How many times should it be rolled before the probability we see a repeated outcome among the rolls is at least 0.5?
Θ(1)
Θ(logn)
Θ(n)
Θ(n)
Answer: Θ(n)
The analysis of the above example begins by inspecting the probability that we do not see a repeated outcome for an n‑sided die after k rolls.
Let Si be the event that the ith roll is a repeated outcome, conditioned on the fact that no previous rolls resulted in a repeated outcome, then its probability is given by
Pr[Si]=ni−1
Then, let Sˉ≤k be the event we do not get a repeat after k trials
Pr[Sˉ≤k]=i=1∏k(1−Pr[Si])=i=1∏k(1−ni−1).
For k=n we have that Pr[Sˉ≤k]<21, as 1−Pr[Sk]=(1−nn)=(1−n1) and n1<21 for n≥4. As a result, we need O(n) trials to get a 0.5 probability of repeating an outcome.
The lower-bound can be shown by the union of the events Si for i∈{1,…,k}
Pr[S1∪⋯∪Sk]≤n0+⋯+nk−1
and by union bound we have that Pr[S1∪⋯∪Sk]≤nk2. The probability none of these events occur is ≥1−nk2; then k=2n yields a probability ≥0.5 of none of these events occurring. Hence, Ω(n) trials are needed to get a repeated outcome.
Therefore, we have k∈Θ(n) to obtain a probability of 0.5 of seeing a repeated outcome using an n-sided die.
Let Xi be the number of pairwise collisions on the ith roll, and we have E[Xi]=ni−1. Let X be the number of pairwise collisions after k rolls, then
Suppose we use an app to learn another language, every day they have a word of the day – and we believe it to be chosen uniformly at random.
We use the app for k=1000 days and we keep track of any duplicates – i.e. collisions.
For n=1,000,000 words, the number of duplicates we expect to see is about
E[X]=2nk(k−1)=2⋅1,000,0001000(999)<0.5.
Suppose we saw 20 duplicates, it is significantly more than the expected number; however, does it contradict our assumption about how the words are sampled?
Solution 5.2 (Markov's)
Using Markov's inequality, we have that
Pr[X≥t⋅E[X]]≤t1
and we know E[X]<0.5 and we have observed 20 duplicates, so
Pr[X≥20]≤401=0.025
Markov's tells us that there is a small yet not insignificant likelihood of seeing 20 duplicates given the setup of our problem. Whether this invalidates our hypothesis depends on our tolerance.
Note that Markov's only gives us an upper-bound, it is not the tightest upper-bound.
Solution 5.2 (Chebyshev's)
Chebyshev's requires us to compute the variance σ2.
Solving for t≥0, we get t=∣X−μ∣/σ=19.52. Substituting into Chebyshev's yields
Pr[∣X−μ∣≥19.5]≤(19.521)2≊0.0013,
which is a whole order of magnitude lower in likelihood than Markov's; indicating that X=20 is pretty unlikely.
Solution 5.2 (Chernoff)
Using μ=0.5 as computed in the previous solutions, we obtain δ by computing δ=∣X−μ∣/μ=39. Substituting into the Chernoff bound, we get that
Pr[∣X−μ∣≥19.5]≤2exp(−2+0.5392(0.5))≊1.5×10−132.
This tells us that it is effectively impossible to get 20 collisions when picking 1000 objects, with replacement and uniformly at random, from 1 million different objects.