Radon-Nikodym Derivative
Theorem 1
Assuming a measurable space $( \Omega , \mathcal{F} )$ is given with measures $\mu$, $\nu$ satisfying $\mu ( \Omega ) = 1$ for all $F \in \mathcal{F}$ and $0 \le \nu (F) \le \mu (F)$, there exists a $\mathcal{F}$-measurable function $h : \Omega \to \mathbb{R}$ for all $F \in \mathcal{F}$ that satisfies $$ \nu (F) = \int_{F} h d \mu $$ This $h$, denoted as $\displaystyle h := {{d \nu } \over {d \mu }}$, is called the Radon-Nikodym derivative of $\nu$ with respect to $\mu$.
- A function $f$ being a $\mathcal{F}$-measurable function means it satisfies $f^{-1} (B) \in \mathcal{F}$ for all Borel sets $B \in \mathcal{B} ( \Omega )$.
Explanation
Under the premise of the theorem that $\mu ( \Omega ) = 1$, $\Omega$ can become a probability space $( \Omega , \mathcal{F} , \mu )$.
The naming of the Radon-Nikodym derivative is quite intuitive, even if putting the exact argument aside for a moment and dealing with it as it appears might unfold like so. Unlike differentiation in basic analysis, it’s a matter of fitting concepts to forms. $$ \begin{align*} \int_{F} h d \mu =& \int_{F} {{d \nu } \over {d \mu }} d \mu \\ =& \int_{F} d \nu \\ =& \nu ( F) \end{align*} $$
The Radon-Nikodym theorem guarantees the unique existence of such Radon-Nikodym derivatives under a few more given conditions.
Proof
Part 1. $\displaystyle h_{\mathcal{P}} = {{\nu} \over {\mu}}$
Let’s define $h_{\mathcal{P}} : \Omega \to \mathbb{R}$ for partitions $ \mathcal{P} := \left\{ A_{1} , \cdots , A_{k} \right\}$ and $\omega \in A_{i}$ as follows. $$ h_{\mathcal{P}} ( \omega ) = \begin{cases} \displaystyle {{\nu (A_{i} ) } \over {\mu (A_{i} ) }} &, \mu ( A_{i} ) > 0 \\ 0 &, \text{otherwise} \end{cases} $$ Given $A_{i}$, according to the definition, $\omega \in A_{i}$ can be treated as a constant function $h_{\mathcal{P}} ( \omega ) = c_{i}$ regardless. Accordingly, $$ \begin{align*} \int_{A_{i}} h_{\mathcal{P}} d \mu =& \int_{A_{i}} {{\nu (A_{i} ) } \over {\mu (A_{i} ) }} d \mu \\ =& \nu (A_{i} ) {{1 } \over {\mu (A_{i} ) }} \int_{A_{i}} d \mu \\ =& \nu (A_{i} ) {{1 } \over {\mu (A_{i} ) }} \mu ( A_{i} ) \\ =& \nu (A_{i} ) \end{align*} $$ Let’s prove the following corollaries.
- Part 1-1.
For all $\mathcal{P}$ of $\Omega$ and for all $\omega \in \Omega$, since $0 \le h_{\mathcal{P}} \le 1$ for all $F \in \mathcal{F}$ satisfies $0 \le \nu (F) \le \mu (F)$, $$\displaystyle 0 \le h_{\mathcal{P}} = {{\nu (F)} \over {\mu (F)}} \le 1$$ - Part 1-2.
$$A = \bigsqcup_{j \in J} A_{j} \implies \nu (A) = \int_{A} h_{\mathcal{P}} d \mu$$ $\bigsqcup$ is a disjoint union and $J \subset \left\{ 1 , \cdots, k \right\}$ is the set of indices. By the properties of measure, $$ \begin{align*} \nu (A) =& \sum_{j \in J} \nu ( A_{j} ) \\ =& \sum_{j \in J} \int_{A_{j}} h_{\mathcal{P}} d \mu \\ =& \int_{A} h_{\mathcal{P}} d \mu \end{align*} $$ Meanwhile, since $\Omega \in \mathcal{F}$ by the definition of the sigma field, naturally $\displaystyle \nu ( \Omega ) = \int_{ \Omega } h_{\mathcal{P}} d \mu$ holds true. - Part 1-3.
- Part 1-3-1. For all $A \in \mathcal{P}_{1}$, $\displaystyle \int_{A} h_{1} d \mu = \int_{A} h_{2} d \mu$ holds true.
Let’s say $\mathcal{P}_{2}$ is a refinement of $\mathcal{P}_{1}$, and for convenience, denote it as $h_{n} := h_{\mathcal{P}_{n}}$. By the definition of refinement, there exist $B_{j} \in \mathcal{P}_{2}$ that satisfy $\displaystyle A = \bigsqcup_{j \in J} B_{j}$ for all $A \in \mathcal{P}_{1}$. Therefore, $$ \begin{align*} \int_{A} h_{1} d \mu =& \nu (A) \\ =& \sum_{j \in J} \nu ( B_{j} ) \\ =& \sum_{j \in J} \int_{B_{j}} h_{2} d \mu \\ =& \int_{A} h_{2} d \mu \end{align*} $$ - Part 1-3-2. For all $A \in \mathcal{P}_{1}$, $\displaystyle \int_{A} h_{1} h_{2} d \mu = \int_{A} h_{1}^{2} d \mu$ holds true. $$\begin{align*} \int_{A} h_{1} h_{2} d \mu =& {{\nu (A ) } \over {\mu (A ) }} \int_{A} h_{2} d \mu \\ =& {{\nu (A ) } \over {\mu (A ) }} \nu (A ) \\ =& \int_{A} \left[ {{\nu (A ) } \over {\mu (A ) }} \right]^2 d \mu \\ =& \int_{A} h_{1}^{2} d \mu \end{align*} $$
- Part 1-3-1. For all $A \in \mathcal{P}_{1}$, $\displaystyle \int_{A} h_{1} d \mu = \int_{A} h_{2} d \mu$ holds true.
- Part 1-4.
- Part 1-4-1. $\displaystyle \int_{A} ( h_{2} - h_{1} )^2 d \mu = \int_{A} \left[ h_{2}^{2} - h_{1}^{2} \right] d \mu$
According to Part 1-3-2, for all $A \in \mathcal{P}_{1}$, $\displaystyle \int_{A} h_{1} ( h_{2} - h_{1} ) d \mu = 0$ holds true, and according to Part 1-2, $\displaystyle \int_{ \Omega } h_{1} ( h_{2} - h_{1} ) d \mu = 0$ also stands. Thus, $$ \begin{align*} \int_{A} ( h_{2} - h_{1} )^2 d \mu =& \int_{A} \left( h_{2}^{2} - 2 h_{2} h_{1} + h_{1}^{2} \right) d \mu \\ =& \int_{A} \left[ h_{2}^{2} - 2 h_{1} (h_{2} - h_{1}) - h_{1}^{2} \right] d \mu \\ =& \int_{A} \left[ h_{2}^{2} - h_{1}^{2} \right] d \mu \end{align*} $$ - Part 1-4-2. $\displaystyle \int_{\Omega} h_{1}^{2} d \mu \le \int_{\Omega} h_{2}^{2} d \mu$
$$\begin{align*} \int_{\Omega} h_{2}^{2} d \mu =& \int_{\Omega} h_{1}^{2} d \mu + \int_{\Omega} (h_{2} - h_{1})^{2} d \mu \\ \ge& \int_{\Omega} h_{1}^{2} d \mu \end{align*} $$
- Part 1-4-1. $\displaystyle \int_{A} ( h_{2} - h_{1} )^2 d \mu = \int_{A} \left[ h_{2}^{2} - h_{1}^{2} \right] d \mu$
Part 2. $\displaystyle h := \lim_{n \to \infty} h_{\mathcal{Q}_{n}}$
In Part 1-4-2, it was confirmed that $\displaystyle \int_{\Omega} h_{\mathcal{P}}^{2} d \mu \le \int_{\Omega} h_{\mathcal{P} ' }^{2} d \mu$ holds for the refinement $\mathcal{P} ' $ of $\mathcal{P}$. Moreover, since $0 \le h_{\mathcal{P}} \le 1$ in Part 1-1 and the assumption was $\mu ( \Omega ) = 1$, $c := \sup \int_{\Omega} h_{\mathcal{P}}^{2} d \mu$ exists between $0$ and $1$. [ NOTE: In fact, even if it’s $\mu ( \Omega ) \ne 1$, it can be substituted as $\mu ' := \mu / \mu ( \Omega)$. ] Now consider a partition $\mathcal{Q}_{n}$ that becomes a refinement of both given $n \in \mathbb{N}$ against $\displaystyle \int_{\Omega} h_{\mathcal{P}_{n}}^{2} d \mu > c - {{1} \over {4^{n}}}$. Naturally, $\mathcal{Q}_{n+1}$ is a refinement of $\mathcal{Q}_{n}$ and satisfies the following inequality. $$ c - {{1} \over {4^{n}}} \le \int_{\Omega} h_{\mathcal{P}_{n}}^{2} d \mu \le \int_{\Omega} h_{\mathcal{Q}_{n}}^{2} d \mu \le \int_{\Omega} h_{\mathcal{Q}_{n+1}}^{2} d \mu \le c $$ According to Part 1-4-1, since the square $^2$ can enter into the parenthesis, $$ \int_{\Omega} \left( h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right)^{2} d \mu = \int_{\Omega} \left( h_{\mathcal{Q}_{n+1}}^{2} - h_{\mathcal{Q}_{n}}^{2} \right) d \mu < {{1} \over {4^{n}}} $$
Cauchy-Schwarz Inequality: If $f,g \in \mathcal{L}^{2} (E)$, then $fg \in L^{1}(E)$, and $$ \left\| \int_{E} f \overline{g} dm \right\| \le \left\| f g \right\|_{1} \le \left\| f \right\|_{2} \left\| g \right\|_{2} $$
Setting $f = | h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} |$, $g = 1$ in the Cauchy-Schwarz inequality, for all $n \in \mathbb{N}$, $$ \begin{align*} \int_{\Omega} \left| h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right| d \mu \le & \sqrt{ \int_{\Omega} \left( h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right)^{2} d \mu } \sqrt{ \int_{\Omega} 1 d \mu } \\ =& \sqrt{\int_{\Omega} \left( h_{\mathcal{Q}_{n+1}}^{2} - h_{\mathcal{Q}_{n}}^{2} \right) d \mu} \sqrt{ \mu ( \Omega ) } \\ <& {{1} \over {2^{n}}} \cdot 1 \end{align*} $$
Levi’s Theorem: If $\displaystyle \sum_{k=1}^{\infty} \int |f_{k}| dm < \infty$, then $\displaystyle \sum_{k=1}^{\infty} f_{k} (x)$ converges almost everywhere and $$ \int \sum_{k=1}^{\infty} f_{k} dm = \sum_{k=1}^{\infty} \int f_{k} dm $$
Since $\displaystyle \sum_{n \in \mathbb{N}} \int_{\Omega} \left| h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right| d \mu < \infty$, by Levi’s theorem, $$ \sum_{n \in \mathbb{N}} \left( h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right) = \lim_{n \to \infty} h_{\mathcal{Q}_{n}} - h_{\mathcal{Q}_{1}} $$ converges almost everywhere with respect to $\mu$. Now let’s define $h$ as follows. $$ h := h_{\mathcal{Q}_{1}} + \sum_{n \in \mathbb{N}} \left( h_{\mathcal{Q}_{n+1}} - h_{\mathcal{Q}_{n}} \right) = \lim_{n \to \infty} h_{\mathcal{Q}_{n}} $$
Part 3. $\displaystyle \nu (F) = \int_{F} h d \mu$
By the definition of $h$, $0 \le h \le 1$ is $\mathcal{F}$-measurable. Now it suffices to show that $\displaystyle \nu (F) = \int_{F} h d \mu $ for all $F \in \mathcal{F}$. Fix one $F \in \mathcal{F}$ and define $\mathcal{R}_{n}$ as a common refinement partition of $\mathcal{Q}_{n}$ and $\left\{ F , F^{c} \right\}$. Then, according to $\displaystyle A = \bigsqcup_{j \in J} A_{j} \implies \nu (A) = \int_{A} h_{\mathcal{P}} d \mu$ shown in Part 1-2, for all $n \in \mathbb{N}$, $$ \begin{align*} \nu (F) =& \int_{F} h_{ \mathcal{R}_{n} } d \mu \\ =& \int_{F} ( h_{ \mathcal{R}_{n} } - h_{ \mathcal{Q}_{n} } )d \mu + \int_{F} h_{ \mathcal{Q}_{n} } d \mu \end{align*} $$ is satisfied. Meanwhile, using the Cauchy-Schwarz inequality for $\mathcal{R}_{n}$ as in Part 2, we can derive $\displaystyle \left| \int_{\Omega} h_{\mathcal{R}_{n}} - h_{\mathcal{Q}_{n}} d \mu \right| < {{1} \over {2^{n}}}$, thus $$ \nu (F) =0 + \lim_{n \to \infty} \int_{F} h_{ \mathcal{Q}_{n} } d \mu $$
Dominated Convergence Theorem: For measurable sets $E \in \mathcal{M}$ and $g \in \mathcal{L}^{1} (E)$, suppose a sequence of measurable functions $\left\{ f_{n} \right\}$ satisfies $|f_{n}| \le g$ almost everywhere on $E$. If almost everywhere on $E$, $\displaystyle f = \lim_{n \to \infty} f_{n}$ holds, then $f \in \mathcal{L}^{1}(E)$ and $$ \lim_{ n \to \infty} \int_{E} f_{n} (x) dm = \int_{E} f dm $$
Since $0 \le h_{\mathcal{Q}_{n}} \le 1$ for all $n \in \mathbb{N}$ and $h$ was defined as such in Part 2, by the Dominated Convergence Theorem, $$ \begin{align*} \nu (F) =& \lim_{n \to \infty} \int_{F} h_{ \mathcal{Q}_{n} } d \mu \\ =& \int_{F} h d \mu \end{align*} $$
■
Meanwhile, from Part 1~2 of the proof above, we derive the following corollary.
Corollary
If for all $n \in \mathbb{N}$, $\mathcal{Q}_{n+1}$ is a refinement of $\mathcal{Q}_{n}$, $$ \lim_{n \to \infty} h_{\mathcal{Q}_{n}} = \lim_{n \to \infty} {{\nu} \over {\mu}} = {{d \nu } \over {d \mu }} $$
Folland. (1999). Real Analysis: Modern Techniques and Their Applications(2nd Edition): p91. ↩︎