The study of modern probability is built on the foundation of measure theory. One of its most basic and key results, the strong law of large numbers, demonstrates the usefulness and power of this foundation. Informally, this law states that the sample mean gets close to the true mean as sample size grows large.
We would like to be more precise about the meaning of the phrase "get close to". Certainly we cannot replace it by "converges to", as one could be unlucky and consistently draw below or above the mean so that the sample mean is no where near the true mean. However, just by intuition we know that consistently drawing below or above the mean, while not impossible, becomes increasingly unlikely as the sample size grows. This naturally leads to the idea of convergence in probability.
Convergence in probability
A sequence of random variables converges in probability to if for any ,
This is simply convergence in measure for the probability space. Equipped with this definition, we can now state a version of the weak law of large numbers: the sample average converges in probability to the mean, if the iid random variables have finite variance. In history, Bernoulli first proved this theorem for Bernoulli random variables in 1713, when tools like Chebyshev's inequality wasn't even discovered. Later on, mathmaticians proved more general cases that does not the require the assumption of finite variance or strict independence of random variables.
Proof. Without loss of generality assume that the random variables are centered, and let denote the sample average. By Chebyshev's inequality,
as approaches infinity. Note that for Chebyshev's inequality to hold we do need finite variance.
It is natural to wonder the possibility of obtaining a stronger notion of convergence, perhaps under stronger assumptions. To do so we need to formally define the stronger type of convergence that we need, which in study of probability is often refered to as "almost sure convergence".
Almost sure convergence
A sequence of random variables converges almost surely to if
Note that this is the almost everywhere convergence for probability space, which is a stronger version of convergence. This is due to the fact that in a finite measure space, convergence almost everywhere implies convergence in measure.
The strong law of large number carries this stronger notion of converges and states: the sample average converges almost surely to the mean. Before the proof, we first introduce a lemma.
Borel Catelli Lemma
Let be a probability space, and any sequence of events in . If , then .
On the other hand, if , then .
Proof.
Fix , there exists sufficiently large such that . By union bound we obtain . So . By continuity of measure , we have .
The other statement follows easily from the similar technique.
Strong law of large numbers
Instead of directly stating the theorem, we explore a little bit on our own as to what assumptions we might need in order to achieve this "almost sure convergence" to the mean. Here, we are interested in the set where converges to , and we can rewrite this set as . To relate it to Borel-Cantelli lemma, we instead study its complement
. By continuity of measure, we have .
It suffices to show that for any fixed , . This closely resembles the conclusion of the Borel-Cantelli lemma, so we might be tempted to show that . Unfortunately, a simple application of Chebyshev's inequality shows that this fall short, as we get
Therefore, stronger assumptions must be made if we are to use the same proof technique. Since finiteness of higher moments implies that of the the lower ones, it is natural for us to assume finite fourth moment, which gives us
Then by Borel-Cantelli lemma,
converges a.s to
.