Cholesky Decomposition of Positive Definite Matrices
Overview
Given a reversible matrix to a diagonal matrix, just as we could perform LDU decomposition, when the condition is strengthened to a positive definite matrix, an even more efficient matrix decomposition called Cholesky Decomposition can be done.
Build-up
Consider a $m \times m$ positive definite matrix $A : = \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} > 0$. If $a_{11}$ is positive, $\mathbf{w} \in \mathbb{R}^{m-1}$ and $K \in \mathbb{R}^{(m-1) \times (m-1)}$. If $a_{11} \le 0$, then for $\mathbf{e}_{1} = (1, 0, \cdots , 0)$, $\mathbf{e}_{1}^{T} A \mathbf{e}_{1} = a_{11} \le 0$, so $A$ cannot be positive definite. For any vector $\mathbb{0 } \ne \mathbf{x} \in \mathbb{R}^{m-1}$,
$$ \mathbf{x}^{T} K \mathbf{x} = \begin{bmatrix} 0 & \mathbf{x}^{T} \end{bmatrix} \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} \begin{bmatrix} 0 \\ \mathbf{x} \end{bmatrix} > 0 $$
so $K > 0$. If we conveniently set $a_{11} = 1$,
$$ A_{0} : = \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} = \begin{bmatrix} 1 & \mathbb{0} \\ \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K - \mathbf{w} \mathbf{w}^{T} \end{bmatrix} \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbb{0} & I \end{bmatrix} $$
can be represented. Here, $L_{1} := \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix}$ is an upper triangular matrix, and $A_{1} := \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K - \mathbf{w} \mathbf{w}^{T} \end{bmatrix}$ is positive definite, so $K - \mathbf{w} \mathbf{w}^T > 0$. Therefore, repeating this process $A_{n-1} = L_{n} A_{n} L_{n}^{T}$ until $n=m$,
$$ A_{0} = L_{1} \cdots L_{m} I_{m} L_{m}^{T} \cdots L_{1}^{T} $$
can be shown like this.
Generalization
Now let’s generalize for the case where $a_{11}>0$. If we assume $A := A_{0}$,
$$ \begin{align*} A_{0} &= \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} \\ =& \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix} \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \end{align*} $$
$\begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix}$ is a form covered in the case of $a_{11}=1$, so
$$ \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix} = \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ \mathbb{0} & I \end{bmatrix} $$
$$ L_{1} := \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix} $$
And
$$ A_{1} := \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix} $$
set it to
$$ \begin{align*} A_{0} =& \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \\ =& L_{1} A_{1} L_{1}^{T} \end{align*} $$
Likewise, repeating the process $A_{n-1} = L_{n} A_{n} L_{n}^{T}$ until $n=m$,
$$ A_{0} = L_{1} \cdots L_{m} I_{m} L_{m}^{T} \cdots L_{1}^{T} $$
is obtained. Since $L_{1} , L_{2} , \cdots L_{m}$ is a lower triangular matrix, their product $L : = L_{m} L_{m-1} \cdots L_{1}$ is also a lower triangular matrix, and is represented as $A_{0} = A = L L^{T}$.
Just like that, representing $A = L L^{T}$ only with lower triangular matrix $L$ is called Cholesky Decomposition. Although the condition of being positive definite is quite strong, its decomposition is surprisingly simple and can be found with very few computations. Now, specifically
$$ L : = \begin{bmatrix} l_{11} & 0 & 0 & \cdots & 0 \\ l_{21} & l_{22} & 0 & \cdots & 0 \\ l_{31} & l_{32} & l_{33} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \cdots & l_{nn} \end{bmatrix} $$
let’s directly compute $A = (a_{ij}) = LL^{T}$. Since there’s no need to compute beyond the diagonal components,
$$ a_{ij} = \sum_{s=1}^{n} l_{is} l_{sj}^{T} = \sum_{s=1}^{\min (i,j)} l_{is} l_{js} $$
if $i=j=k$ then $$ a_{kk} = \sum_{s=1}^{k-1} l_{ks}^{2} + l_{kk}^2 \\ l_{kk} = \sqrt{ a_{kk} - \sum_{s=1}^{k-1} l_{ks}^{2} } $$
if $j=k$ then regarding $i = k+1, \cdots , n$ $$ a_{ik} = \sum_{s=1}^{k-1} l_{is} l_{ks} + l_{ik} l_{kk} \\ l_{ik} = {{1} \over {l_{kk}}} \left( a_{ik} - \sum_{s=1}^{k-1} l_{is} l_{ks} \right) $$
Algorithm
Assume $(a_{ij}) \in \mathbb{R}^{n \times n}$ is a positive definite matrix.
Step 1. $k = 1$
Substitute $d_{11} = \sqrt{a_{11}}$ and calculate $\displaystyle l_{i1} = {{1} \over {d_{11}}} a_{i1}$.
Step 2. For $k = 2, 3, \cdots , n-1$, calculate the following:
Step 2-1. $$ l_{kk} = \sqrt{ a_{kk} - \sum_{s=1}^{k-1} l_{ks}^{2} } $$
Step 2-2. For $i = k+1 , k+2 , \cdots , n-1$, calculate the following: $$ l_{ik} = {{1} \over {l_{kk}}} \left( a_{ik} - \sum_{s=1}^{k-1} l_{is} l_{ks} \right) $$
Step 3. For $k = n$, calculate the following: $$ l_{nn} = \sqrt{ a_{nn} - \sum_{s=1}^{n-1} l_{ns}^{2} } $$
- Compared to LU decomposition or LDU decomposition, you can see the calculations have significantly reduced. The computations still done are not complicated and have become similar to calculating inner products and norms.
Code
Now, let’s actually try the Cholesky Decomposition.
Below is the code implemented in R for the above algorithm.
B <- matrix(c(2,1,0,1,2,1,0,1,2), ncol= 3); B
Cholesky<-function(A){
n=length(A[,1])
L<-diag(0,n)
k=1
L[1,1] = sqrt(A[1,1])
L[,1] = A[,1]/L[1,1]
for(k in 2:(n-1)) {
L[k,k] = sqrt(A[k,k] - sum(L[k,1:(k-1)])^2 )
for(i in (k+1):n) {
L[i,k] = ( A[i,k] - sum( L[i,1:(k-1)] * L[k,1:(k-1)] ) )/L[k,k]
}
}
L[n,n] = sqrt( A[n,n] - sum(L[n,1:(n-1)]^2) )
return(list(L=L))
}
Cholesky(B)
t(chol(B))
The execution result is as follows.
Comparing with the Cholesky decomposition built into R, you can see it matches exactly.