Sampling
Sampling§
Repeated Sampling§
A Pair of Dice
Suppose we are given two 6 sided dice, then our sample space is a set of a pairing of possible outcomes from the first and second dice,
$$ \Omega = \{(i, j) | i, j \in \{1 \dots 6\}\} $$
where $(i, j)$ are an ordered pair, i.e. $(i, j) \neq (j, i).$
Then, $$ \forall i, j, P((i, j)) = \frac{1}{6^2} $$
Let $$ \begin{align*} D &= \{\text{sum of dice is }8\}\\ &= \{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)\} \end{align*} $$
Then, $$ P(D) = \sum_{d \in D} P(d). $$
Because each pair is equally likely, $\forall d \in D, P(d) = \frac{1}{36},$ therefore
$$ P(D) = \frac{5}{36}. $$
Flipping a Coin Multiple Times
Suppose we flipped a coin 3 times in a row,
$$ \Omega = \{0, 1\}^{3}, $$
where $0$ is heads and $1$ tails, so $\forall w \in \Omega, P(w) = \frac{1}{8}$
Cartesian Product
Given sets $A_1, A_2, \dots, A_n$ then we define the Cartesian product $$ A_1 \times A_2 \times \dots \times A_n $$
as the set of ordered $n$-tuples
$$ (a_1, a_2, \dots, a_n) $$
where $a_1 \in A_1, a_2 \in A_2, \dots, a_n \in A_n.$
As a consequence, we can rewrite $\Omega = \{(i, j) | i, j \in \{1 \dots 6\}\}$ from the first example as $\Omega = \{0, 6\}^{2}.$ Here, exponentiation means the repeated Cartesian product with itself.
The Cartesian product is useful as probability often deals with repetitions of a sample.
Random Sampling§
Random Sampling
Choosing elements at random from a given set is called sampling.
Cardinality
If a set $S$ contains $n$ elements, then we say that the cardinality of $S$ to be $n,$ and we write
$$ |S| = n. $$
If all outcomes of a random experiment are equally likely, then
$$\forall \omega \in \Omega, P(\omega) = |\Omega|^{-1} \iff \sum_{w \in \Omega} P(\omega) = 1.$$
If outcomes are equally likely, then we can say that it was chosen uniformly at random.
Fact
If $\Omega$ is finite and each outcome equally likely, then for any subset $A \subseteq \Omega,$
$$ P(A) = \sum_{a \in A} P(a) = \frac{|A|}{|\Omega|}. $$
Generally, random outcomes are not equally likely.
Sampling Mechanisms§
Ordered Sampling
Objects are chosen one at a time and the order is tracked.
Unordered Sampling
Objects are chosen one at a time and only the identity of objects matter, not their order.
Sampling With or Without Replacement
After each choice, we have another decision to make: to return the object or to discard, i.e. to replace or to not replace.
Urn Example Setup§
For the next few sections, suppose we are given an urn containing numbered balls, and we reach in and retrieve one ball. Also assume that the balls are uniformly random.
Ordered Sampling with Replacement§
Suppose we pick a ball from the urn, copy the number, put it back, and repeat $k$ times. This yields an ordered $k$-tuple of numbers,
$$ \omega = (s_1, s_2, \dots, s_k). $$
Alternatively, $\Omega = S^k$ where $S = \{1\dots n\},$ and $$ |\Omega| = |S|^k = n^k. $$
As we picked uniformly at random, $$ \forall \omega \in \Omega, P(\omega) = \frac{1}{n^k}. $$
With Replacement
Suppose the urn contains $4$ balls, and we pick $k = 3$ times.
$$ \Omega = \{1..4\}^3,\quad |\Omega| = 64. $$
Then, each triplet is equally likely, for example $$ P((1, 1, 1)) = \frac{1}{64}. $$
Ordered Sampling without Replacement§
Suppose we pick a ball from the urn, copy the number, discard the ball, and repeat $k$ times. Clearly for this to work, $k \leq n,$ and this yields a $k$-tuple of numbers.
At step $i, s_i \in \{1..n\},$ but $s_1, s_2, \dots, s_k$ must be distinct,
$$ \Omega = \{(s_1, s_2, \dots, s_k) | s_i \in \{1..n\} \wedge \forall i, j,\,s_i \neq s_j\}. $$
A natural question is the cardinality of $\Omega,$ at step $1$ we have $n$ choices, at $2$ we have $n-1$ choices, and so on. At step $k,$ we have $n - k + 1$ choices. Then, $\Omega$ must contain $n(n-1)(n-2)\dots(n-k+1)$ elements, i.e.
$$ |\Omega| = \prod_{i=0}^{k-1}(n - i) = (n)_k $$
Descending Factorial
The descending factorial is written using the notation $(n)_k$ and is defined by
$$ (n)_k = \prod_{i=0}^{k-1}(n - i). $$
By assuming equally likely outcomes,
$$ \forall \omega \in \Omega, P(\omega) = \frac{1}{(n)_k}. $$
Without Replacement
Suppose we sample 3 balls without replacement from an urn containing a total of 4 balls numbered from 1 to 4, distinctly.
What is the probability that we get the sequence $2, 1, 4?$
$$ P((2, 1, 4)) = \frac{1}{(4)_3} = \frac{1}{4 \cdot 3 \cdot 2}, $$
because of our assumption of uniformity.
What is the probability that we get the sequence $1, 2, 1?$ $$ P((1, 2, 1)) = 0, $$ it is impossible as we sampled $1$ first then removed it, so we cannot get it again.
Unordered Sampling without Replacement§
Suppose we sampled $k$ balls without replacement, and only recorded which balls appeared and not the order. Again, $k \leq n.$
The outcome will be some subset from the set $S,$ with cardinality $k,$ i.e. the sample space is
$$ \Omega = \{\omega | \omega \in \mathcal{P}(S), |\omega| = k\}. $$
Then, the number of outcomes of the sample space is given by
$$ |\Omega| = \binom{n}{k} = \frac{n!}{(n-k)!\,k!} $$
and for our example all outcomes equally likely means that the probability measure is given by
$$ \forall \omega \in \Omega,\,P(\omega) = \binom{n}{k}^{-1}. $$
Alternatively, instead thinking about the situation as taking out one ball at a time $k$ times, we can think of it as:
- Randomize order of $n$ balls.
- Look at the first $k$ balls.
- Note the identity of the $k$ balls, ignoring order.
Number of outcomes
How exactly do we compute the probability of $k$ balls being $\{s_1, s_2, \dots, s_k\}?$
$$ \overbrace{\underbrace{s_1, s_2, s_3, s_4, \dots, s_k}_{k}, \underbrace{\dots, s_n}_{n-k}}^{n} $$
- There are $n!$ possible arrangements of $n$ balls.
- The first $k$ balls have an ordering $\{s_1, s_2, \dots, s_k\}$ with $k!$ possible arrangements.
- The final $n - k$ balls have $(n - k)!$ possible arrangements.
So the number of favourable orderings such that the first $k$ balls are $\{s_1, s_2, \dots, s_k\}$ is given by
$$ k!\,(n-k)! $$
$$ P(\{s_1, s_2, \dots, s_k\}) = \frac{\text{\# favourable}}{\text{\# total}} = \frac{k!\,(n-k)!}{n!} $$
There are many ways to solve problems, sometimes adding additional structure such as order may help solve a problem. In any case, we must pick one and stick with it.