logo

Cholesky Decomposition of Positive Definite Matrices 📂Matrix Algebra

Cholesky Decomposition of Positive Definite Matrices

Overview

Given a reversible matrix to a diagonal matrix, just as we could perform LDU decomposition, when the condition is strengthened to a positive definite matrix, an even more efficient matrix decomposition called Cholesky Decomposition can be done.

Build-up

Consider a m×mm \times m positive definite matrix A:=[a11wTwK]>0A : = \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} > 0. If a11a_{11} is positive, wRm1\mathbf{w} \in \mathbb{R}^{m-1} and KR(m1)×(m1)K \in \mathbb{R}^{(m-1) \times (m-1)}. If a110a_{11} \le 0, then for e1=(1,0,,0)\mathbf{e}_{1} = (1, 0, \cdots , 0), e1TAe1=a110\mathbf{e}_{1}^{T} A \mathbf{e}_{1} = a_{11} \le 0, so AA cannot be positive definite. For any vector 0xRm1\mathbb{0 } \ne \mathbf{x} \in \mathbb{R}^{m-1},

xTKx=[0xT][a11wTwK][0x]>0 \mathbf{x}^{T} K \mathbf{x} = \begin{bmatrix} 0 & \mathbf{x}^{T} \end{bmatrix} \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} \begin{bmatrix} 0 \\ \mathbf{x} \end{bmatrix} > 0

so K>0K > 0. If we conveniently set a11=1a_{11} = 1,

A0:=[1wTwK]=[10wI][100KwwT][1wT0I] A_{0} : = \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} = \begin{bmatrix} 1 & \mathbb{0} \\ \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K - \mathbf{w} \mathbf{w}^{T} \end{bmatrix} \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbb{0} & I \end{bmatrix}

can be represented. Here, L1:=[1wTwK]L_{1} := \begin{bmatrix} 1 & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} is an upper triangular matrix, and A1:=[100KwwT]A_{1} := \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K - \mathbf{w} \mathbf{w}^{T} \end{bmatrix} is positive definite, so KwwT>0K - \mathbf{w} \mathbf{w}^T > 0. Therefore, repeating this process An1=LnAnLnTA_{n-1} = L_{n} A_{n} L_{n}^{T} until n=mn=m,

A0=L1LmImLmTL1T A_{0} = L_{1} \cdots L_{m} I_{m} L_{m}^{T} \cdots L_{1}^{T}

can be shown like this.

Generalization

Now let’s generalize for the case where a11>0a_{11}>0. If we assume A:=A0A := A_{0},

A0=[a11wTwK]=[a1100I][11a11wT1a11wK][a1100I] \begin{align*} A_{0} &= \begin{bmatrix} a_{11} & \mathbf{w}^{T} \\ \mathbf{w} & K \end{bmatrix} \\ =& \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix} \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \end{align*}

[11a11wT1a11wK]\begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix} is a form covered in the case of a11=1a_{11}=1, so

[11a11wT1a11wK]=[101a11wI][100KwwTa11][11a11wT0I] \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & K \end{bmatrix} = \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w}^{T} \\ \mathbb{0} & I \end{bmatrix}

L1:=[a1100I][101a11wI] L_{1} := \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix}

And

A1:=[100KwwTa11] A_{1} := \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix}

set it to

A0=[a1100I][101a11wI][100KwwTa11][11a11w0I][a1100I]=L1A1L1T \begin{align*} A_{0} =& \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ {{1} \over {\sqrt{a_{11}}}} \mathbf{w} & I \end{bmatrix} \begin{bmatrix} 1 & \mathbb{0} \\ \mathbb{0} & K-{{ \mathbf{w} \mathbf{w}^{T} } \over {a_{11}}} \end{bmatrix} \begin{bmatrix} 1 & {{1} \over {\sqrt{a_{11}}}} \mathbf{w} \\ \mathbb{0} & I \end{bmatrix} \begin{bmatrix} \sqrt{a_{11}} & \mathbb{0} \\ \mathbb{0} & I \end{bmatrix} \\ =& L_{1} A_{1} L_{1}^{T} \end{align*}

Likewise, repeating the process An1=LnAnLnTA_{n-1} = L_{n} A_{n} L_{n}^{T} until n=mn=m,

A0=L1LmImLmTL1T A_{0} = L_{1} \cdots L_{m} I_{m} L_{m}^{T} \cdots L_{1}^{T}

is obtained. Since L1,L2,LmL_{1} , L_{2} , \cdots L_{m} is a lower triangular matrix, their product L:=LmLm1L1L : = L_{m} L_{m-1} \cdots L_{1} is also a lower triangular matrix, and is represented as A0=A=LLTA_{0} = A = L L^{T}.

Just like that, representing A=LLTA = L L^{T} only with lower triangular matrix LL is called Cholesky Decomposition. Although the condition of being positive definite is quite strong, its decomposition is surprisingly simple and can be found with very few computations. Now, specifically

L:=[l11000l21l2200l31l32l330ln1ln2ln3lnn] L : = \begin{bmatrix} l_{11} & 0 & 0 & \cdots & 0 \\ l_{21} & l_{22} & 0 & \cdots & 0 \\ l_{31} & l_{32} & l_{33} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \cdots & l_{nn} \end{bmatrix}

let’s directly compute A=(aij)=LLTA = (a_{ij}) = LL^{T}. Since there’s no need to compute beyond the diagonal components,

aij=s=1nlislsjT=s=1min(i,j)lisljs a_{ij} = \sum_{s=1}^{n} l_{is} l_{sj}^{T} = \sum_{s=1}^{\min (i,j)} l_{is} l_{js}

if i=j=ki=j=k then akk=s=1k1lks2+lkk2lkk=akks=1k1lks2 a_{kk} = \sum_{s=1}^{k-1} l_{ks}^{2} + l_{kk}^2 \\ l_{kk} = \sqrt{ a_{kk} - \sum_{s=1}^{k-1} l_{ks}^{2} }

if j=kj=k then regarding i=k+1,,ni = k+1, \cdots , n aik=s=1k1lislks+liklkklik=1lkk(aiks=1k1lislks) a_{ik} = \sum_{s=1}^{k-1} l_{is} l_{ks} + l_{ik} l_{kk} \\ l_{ik} = {{1} \over {l_{kk}}} \left( a_{ik} - \sum_{s=1}^{k-1} l_{is} l_{ks} \right)

Algorithm

Assume (aij)Rn×n(a_{ij}) \in \mathbb{R}^{n \times n} is a positive definite matrix.

Step 1. k=1k = 1

Substitute d11=a11d_{11} = \sqrt{a_{11}} and calculate li1=1d11ai1\displaystyle l_{i1} = {{1} \over {d_{11}}} a_{i1}.


Step 2. For k=2,3,,n1k = 2, 3, \cdots , n-1, calculate the following:

  • Step 2-1. lkk=akks=1k1lks2 l_{kk} = \sqrt{ a_{kk} - \sum_{s=1}^{k-1} l_{ks}^{2} }

  • Step 2-2. For i=k+1,k+2,,n1i = k+1 , k+2 , \cdots , n-1, calculate the following: lik=1lkk(aiks=1k1lislks) l_{ik} = {{1} \over {l_{kk}}} \left( a_{ik} - \sum_{s=1}^{k-1} l_{is} l_{ks} \right)


Step 3. For k=nk = n, calculate the following: lnn=anns=1n1lns2 l_{nn} = \sqrt{ a_{nn} - \sum_{s=1}^{n-1} l_{ns}^{2} }


  • Compared to LU decomposition or LDU decomposition, you can see the calculations have significantly reduced. The computations still done are not complicated and have become similar to calculating inner products and norms.

Code

Now, let’s actually try the Cholesky Decomposition.

5A1D52273

Below is the code implemented in R for the above algorithm.

B <- matrix(c(2,1,0,1,2,1,0,1,2), ncol= 3); B
Cholesky<-function(A){ 
    n=length(A[,1])
    L<-diag(0,n)
    k=1
    L[1,1] = sqrt(A[1,1])
    L[,1] = A[,1]/L[1,1]
    for(k in 2:(n-1))  {
        L[k,k] = sqrt(A[k,k] - sum(L[k,1:(k-1)])^2 )
        for(i in (k+1):n)    {
            L[i,k] = ( A[i,k] - sum( L[i,1:(k-1)] * L[k,1:(k-1)] ) )/L[k,k]
        }
    }
    L[n,n] = sqrt( A[n,n] - sum(L[n,1:(n-1)]^2) )
    return(list(L=L))
}

Cholesky(B)
t(chol(B))

The execution result is as follows. 5A1D52280

Comparing with the Cholesky decomposition built into R, you can see it matches exactly.