Borel-Cantelli Lemma: Difference between revisions

From Optimal Transport Wiki
Jump to navigation Jump to search
(Created page with "=== '''ICTP Real Analysis''' === <blockquote>前言:该笔记开始于2020暑假的最后一个月。由于Royden这本书中已经将定理及其证明讲述得很详细...")
 
(Add links to related articles)
 
(46 intermediate revisions by one other user not shown)
Line 1: Line 1:
=== '''ICTP Real Analysis''' ===


<blockquote>前言:该笔记开始于2020暑假的最后一个月。由于Royden这本书中已经将定理及其证明讲述得很详细,写这个笔记的目的主要是'''疏通脉络''',同时挑选一些比较重要或者'''有代表性的定理证明'''。对于每一个部分的开头,我试图整理出一些主要的脉络和逻辑链。这样即使对于这些知识的构架变得生疏了以后,也能很快通过重新阅读笔记来捡起来 :)
Modern probability is built on the foundation of measure theory. One of its most basic and key results, the strong law of large numbers, demonstrates the usefulness and power of this foundation. Informally, this law states that the sample mean gets close to the true mean as sample size grows large.


更新(10/3/2020):对于ICTP课程中一些更加一般化的测度论的结论,打算通过IMPA的课程以查漏补缺的方式填上。因此这个笔记会逐渐增添内容。
We would like to be more precise about the meaning of the phrase &quot;get close to&quot;. Certainly we cannot replace it by &quot;converges to&quot;, as one could be unlucky and consistently draw below or above the mean so that the sample mean is no where near the true mean. However, just by intuition we know that consistently drawing below or above the mean, while not impossible, becomes increasingly unlikely as the sample size grows. This naturally leads to the idea of convergence in probability.
</blockquote>
__TOC__


==== '''Part 0''' ====
==Convergence in probability==


<blockquote>sigma-algebra <math display="inline">\to</math> Borel set,  
A sequence of random variables <math display="inline">X_n</math> converges in probability to <math display="inline">X</math> if for any <math display="inline"> \epsilon > 0 </math>,


<math display="inline">G_\delta</math> set, <math display="inline">F_\sigma</math> set.
<math display="block">\lim_{n\to \infty}\mathbb{P}(|X_n - X|>\epsilon) = 0.</math>
</blockquote>
(Prop) Every nonempty open set is the disjoint union of a countable collection of open intervals.


证明摘要:基于开集中的任意<math display="inline">x</math>,定义 <math display="inline">a_x := \inf\{z|(z,x)\subseteq O\}</math> 和 <math display="inline">a_x := \sup\{y|(x,y)\subseteq O\}</math> 且<math display="inline">I_x:=(a_x,b_x)</math>. 易证<math display="inline">a_x, b_x \notin O</math>. 根据有理数在实数中的稠密性,可在 <math display="inline">I_x</math> 和有理数<math display="inline">k\in I_x</math> 之间建立一一对应关系。显然<math display="inline">\{I_x\}</math> '''不相交''',且因其与有理数的一一对应对应关系'''可数'''。得证。
This is simply [[Convergence_in_Measure|convergence in measure]] for the probability space. Equipped with this definition, we can now state a version of the weak law of large numbers: the sample average ''converges in probability'' to the mean, if the iid random variables have finite variance. In history, Bernoulli first proved this theorem for Bernoulli random variables in 1713, when tools like Chebyshev's inequality wasn't even discovered. Later on, mathmaticians proved more general cases that does not the require the assumption of finite variance or strict independence of random variables.


(Prop) Every closed set in <math display="inline">\R^n</math> can be written as a countable union of compact sets.  
Proof. Without loss of generality assume that the random variables are centered, and let <math> \overline X_n</math> denote the sample average. By Chebyshev's inequality,
<math> P(|\overline X_n | > \epsilon) \leq \frac{\sigma^2}{n\epsilon} \to 0</math> as <math> n  </math> approaches infinity. Note that for Chebyshev's inequality to hold we do need finite variance.  


Given a set <math display="inline">X</math>, a collection <math display="inline">A</math> of subsets of <math display="inline">X</math> is called a <math display="inline">\sigma</math> algebra provided that
It is natural to wonder the possibility of obtaining a stronger notion of convergence, perhaps under stronger assumptions. To do so we need to formally define the stronger type of convergence that we need, which in study of probability is often refered to as "almost sure convergence".


# it contains the entire set and the empty set
==Almost sure convergence==
# closed under complement
# closed under countable union


(Defn) Borel set
A sequence of random variables <math display="inline">X_n</math> converges almost surely to <math display="inline">X</math> if


The collection <math display="inline">B</math> of Borel sets of real numbers is the smallest <math display="inline">\sigma</math> algebra of sets of real numbers that contains all of the open sets of real numbers.  
<math display="block">\mathbb{P}(\lim_{n\to \infty}X_n = X ) = 1.</math>
Note that this is the almost everywhere convergence for probability space, which is a stronger notion of convergence than convergence in probability. This is due to the fact that in a finite measure space, convergence almost everywhere implies convergence in measure.  


Egoroff's theorem).
The strong law of large number carries this stronger notion of converges and states: under suitable assumptions, the sample average ''converges'' ''almost surely'' to the mean. Before the presentation of the proof, we first introduce a lemma.
 
==Borel Catelli Lemma==
 
Let <math display="inline">(\Omega, B, \mathbb{P})</math> be a probability space, and <math display="inline">\{E_n\}</math> any sequence of events in <math display="inline">B</math>. If <math display="inline">\sum_{i=i}^\infty \mathbb{P}(E_i) < \infty</math>, then <math display="inline">\mathbb{P}(\cap_{n=1}^\infty \cup_{k\geq n} E_k) = 0</math>.
 
On the other hand, if <math display="inline">\sum_{i=1}^\infty \mathbb{P}(E_i) = \infty</math>, then <math display="inline">\mathbb{P}(\cap_{n=1}^\infty \cup_{k\geq n} E_k) = 1</math>.
 
''Proof.''
 
Fix <math display="inline">\epsilon > 0</math>, there exists <math display="inline">n</math> sufficiently large such that <math display="inline">\sum_{i=n}^\infty \mathbb{P}(E_i) < \epsilon</math>. By union bound we obtain <math display="inline">\mathbb{P}(\cup_{k\geq n} E_k) < \epsilon</math>. So <math display="inline">\lim_{n\to \infty} \mathbb{P}(\cup_{k\geq n} E_k) = 0</math>. By continuity of measure , we have <math display="inline">\mathbb{P}(\cap_{n=1}^\infty \cup_{k\geq n} E_k) = \lim_{n\to \infty} \mathbb{P}(\cup_{k\geq n} E_k) = 0</math>.
 
The other statement follows easily from similar technique.
 
==Strong law of large numbers==
Instead of directly stating the unspecified "suitable assumptions", we work backward to see what assumptions is needed to achieve this "almost sure convergence" to the mean. Here, we are interested in the set where <math>\overline X_n </math> converges to <math> \mu </math>, and we can rewrite this set as <math> \{x:  \cap_{k}\cup_n \cap_{m\geq n} |\overline X_m| < \frac{1}{k}\} </math>. To relate it to Borel-Cantelli lemma, we instead study its complement
<math> \{x:  \cup_{k}\cap_n \cup_{m\geq n} |\overline X_m| \geq \frac{1}{k}\} </math>. By continuity of measure, we have <math> P(\{x:  \cup_{k}\cap_n \cup_{m\geq n} |\overline X_m| \geq \frac{1}{k}\}) =  \lim_{k \to \infty} P \{x: \cap_n \cup_{m\geq n} |\overline X_m| \geq \frac{1}{k}\} </math>.
 
It suffices to show that for any fixed <math display="inline">k</math>, <math display="inline">P(\cap_{n}\cup_{k\geq n} |\overline X_n|> \frac{1}{k}) = 0</math>. This closely resembles the conclusion of the Borel-Cantelli lemma, so we might be tempted to show that <math display="inline">\sum_{n=1}^\infty P(|\overline X_n| > \frac{1}{k}) < \infty</math>. Unfortunately, a simple application of Chebyshev's inequality shows that this fall short, as we get
 
<math display="block">\sum_{n=1}^\infty P(|\overline X_n|> \frac{1}{k}) \leq \sum_{n=1}^\infty \frac{k^2\sigma^2}{n} = \infty.</math>
Therefore, stronger assumptions must be made if we are to use the same proof technique. Since finiteness of higher moments implies that of the the lower ones, it is natural for us to assume finite fourth moment, which gives us
<math display="block">\begin{align}
 
\mathbb{P}(|\overline X_n | > \frac{1}{k}) &\leq k^4\mathbb{E}(|\overline X_n|^4) \\
 
&= k^4 \frac{n\mathbb{E}(X_i^4) + 6n(n-1)\mathbb{E}(X_i^2)}{n^4}\\
 
&= k^4 O(\frac{1}{n^2}).
 
\end{align}</math>
Then by Borel-Cantelli lemma, <math display="inline">\overline X_n</math> converges a.s to <math display="inline">\mu</math>.
<ref name="Klenke">, ''Probability Theory: A Comprehensive Course'', §1.0</ref>.
 
==reference==

Latest revision as of 03:54, 19 December 2020

Modern probability is built on the foundation of measure theory. One of its most basic and key results, the strong law of large numbers, demonstrates the usefulness and power of this foundation. Informally, this law states that the sample mean gets close to the true mean as sample size grows large.

We would like to be more precise about the meaning of the phrase "get close to". Certainly we cannot replace it by "converges to", as one could be unlucky and consistently draw below or above the mean so that the sample mean is no where near the true mean. However, just by intuition we know that consistently drawing below or above the mean, while not impossible, becomes increasingly unlikely as the sample size grows. This naturally leads to the idea of convergence in probability.

Convergence in probability

A sequence of random variables converges in probability to if for any ,

This is simply convergence in measure for the probability space. Equipped with this definition, we can now state a version of the weak law of large numbers: the sample average converges in probability to the mean, if the iid random variables have finite variance. In history, Bernoulli first proved this theorem for Bernoulli random variables in 1713, when tools like Chebyshev's inequality wasn't even discovered. Later on, mathmaticians proved more general cases that does not the require the assumption of finite variance or strict independence of random variables.

Proof. Without loss of generality assume that the random variables are centered, and let denote the sample average. By Chebyshev's inequality, as approaches infinity. Note that for Chebyshev's inequality to hold we do need finite variance.

It is natural to wonder the possibility of obtaining a stronger notion of convergence, perhaps under stronger assumptions. To do so we need to formally define the stronger type of convergence that we need, which in study of probability is often refered to as "almost sure convergence".

Almost sure convergence

A sequence of random variables converges almost surely to if

Note that this is the almost everywhere convergence for probability space, which is a stronger notion of convergence than convergence in probability. This is due to the fact that in a finite measure space, convergence almost everywhere implies convergence in measure.

The strong law of large number carries this stronger notion of converges and states: under suitable assumptions, the sample average converges almost surely to the mean. Before the presentation of the proof, we first introduce a lemma.

Borel Catelli Lemma

Let be a probability space, and any sequence of events in . If , then .

On the other hand, if , then .

Proof.

Fix , there exists sufficiently large such that . By union bound we obtain . So . By continuity of measure , we have .

The other statement follows easily from similar technique.

Strong law of large numbers

Instead of directly stating the unspecified "suitable assumptions", we work backward to see what assumptions is needed to achieve this "almost sure convergence" to the mean. Here, we are interested in the set where converges to , and we can rewrite this set as . To relate it to Borel-Cantelli lemma, we instead study its complement . By continuity of measure, we have .

It suffices to show that for any fixed , . This closely resembles the conclusion of the Borel-Cantelli lemma, so we might be tempted to show that . Unfortunately, a simple application of Chebyshev's inequality shows that this fall short, as we get

Therefore, stronger assumptions must be made if we are to use the same proof technique. Since finiteness of higher moments implies that of the the lower ones, it is natural for us to assume finite fourth moment, which gives us
Then by Borel-Cantelli lemma, converges a.s to . [1].

reference

  1. , Probability Theory: A Comprehensive Course, §1.0