Cholesky Decomposition

  • A basic tenet in numerical analysis:

The structure should be exploited whenever solving a problem.

Common structures include: symmetry, definiteness, sparsity, Kronecker product, low rank, ...

  • LU decomposition (Gaussian Elimination) is not used in statistics so often because most of time statisticians deal with positive (semi)definite matrix.

  • Consider solving the normal equation $$ \mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y} $$ for linear regression. The coefficient matrix $\mathbf{X}^T \mathbf{X}$ is symmetric and positive semidefinite. How to exploit this structure?

Cholesky decomposition

  • Theorem: Let $\mathbf{A} \in \mathbb{R}^{n \times n}$ be symmetric and positive definite. Then $\mathbf{A} = \mathbf{L} \mathbf{L}^T$, where $\mathbf{L}$ is lower triangular with positive diagonal entries and is unique.
    Proof (by induction):
    If $n=1$, then $\ell = \sqrt{a}$. For $n>1$, the block equation $$ \begin{eqnarray*} \begin{pmatrix} a_{11} & \mathbf{a}^T \\ \mathbf{a} & \mathbf{A}_{22} \end{pmatrix} = \begin{pmatrix} \ell_{11} & \mathbf{0}_{n-1}^T \\ \mathbf{l} & \mathbf{L}_{22} \end{pmatrix} \begin{pmatrix} \ell_{11} & \mathbf{l}^T \\ \mathbf{0}_{n-1} & \mathbf{L}_{22}^T \end{pmatrix} \end{eqnarray*} $$ has solution $$ \begin{eqnarray*} \ell_{11} &=& \sqrt{a_{11}} \\ \mathbf{l} &=& \ell_{11}^{-1} \mathbf{a} \\ \mathbf{L}_{22} \mathbf{L}_{22}^T &=& \mathbf{A}_{22} - \mathbf{l} \mathbf{l}^T = \mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}^T. \end{eqnarray*} $$
    Now $a_{11}>0$ (why?), so $\ell_{11}$ and $\mathbf{l}$ are uniquely determined. $\mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}^T$ is positive definite because $\mathbf{A}$ is positive definite (why?). By induction hypothesis, $\mathbf{L}_{22}$ exists and is unique.

  • The constructive proof completely specifies the algorithm.

  • Computational cost: $$ \frac{1}{2} [2(n-1)^2 + 2(n-2)^2 + \cdots + 2 \cdot 1^2] \approx \frac{1}{3} n^3 \quad \text{flops} $$ plus $n$ square roots. Half the cost of LU decomposition by utilizing symmetry.

  • In general Cholesky decomposition is very stable. Failure of the decomposition simply means $\mathbf{A}$ is not positive definite. It is an efficient way to test positive definiteness.

Pivoting

  • When $\mathbf{A}$ does not have full rank, e.g., $\mathbf{X}^T \mathbf{X}$ with a non-full column rank $\mathbf{X}$, we encounter $a_{kk} = 0$ during the procedure.

  • Symmetric pivoting. At each stage $k$, we permute both row and column such that $\max_{k \le i \le n} a_{ii}$ becomes the pivot. If we encounter $\max_{k \le i \le n} a_{ii} = 0$, then $\mathbf{A}[k:n,k:n] = \mathbf{0}$ (why?) and the algorithm terminates.

  • With symmetric pivoting: $$ \mathbf{P} \mathbf{A} \mathbf{P}^T = \mathbf{L} \mathbf{L}^T, $$ where $\mathbf{P}$ is a permutation matrix and $\mathbf{L} \in \mathbb{R}^{n \times r}$, $r = \text{rank}(\mathbf{A})$.

Implementation

  • LAPACK functions: ?potrf (without pivoting), ?pstrf (with pivoting).

  • Julia functions: cholfact, cholfact!, chol, or call LAPACK wrapper functions potrf! and pstrf!

  • Example: positive definite matrix.

In [1]:
A = [4 12 -16; 12 37 -43; -16 -43 98]
Out[1]:
3×3 Array{Int64,2}:
   4   12  -16
  12   37  -43
 -16  -43   98
In [2]:
# Cholesky without pivoting
Achol = cholfact(A)
Out[2]:
Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
[2.0 6.0 -8.0; 0.0 1.0 5.0; 0.0 0.0 3.0]
In [3]:
Achol[:L]
Out[3]:
3×3 LowerTriangular{Float64,Array{Float64,2}}:
  2.0   ⋅    ⋅ 
  6.0  1.0   ⋅ 
 -8.0  5.0  3.0
In [4]:
Achol[:U]
Out[4]:
3×3 UpperTriangular{Float64,Array{Float64,2}}:
 2.0  6.0  -8.0
  ⋅   1.0   5.0
  ⋅    ⋅    3.0
In [5]:
b = [1.0; 2.0; 3.0]
A \ b # this does LU!
Out[5]:
3-element Array{Float64,1}:
 28.5833 
 -7.66667
  1.33333
In [6]:
@which A \ b
Out[6]:
\(A::AbstractArray{T<:Any,2}, B::Union{AbstractArray{T<:Any,1},AbstractArray{T<:Any,2}}) at linalg/generic.jl:349
In [7]:
Achol \ b # two triangular solves
Out[7]:
3-element Array{Float64,1}:
 28.5833 
 -7.66667
  1.33333
In [12]:
@which Achol \ b
Out[12]:
\(F::Factorization, B::Union{AbstractArray{T<:Any,1},AbstractArray{T<:Any,2}}) at linalg/factorization.jl:36
In [10]:
det(A) # this actually does LU!
Out[10]:
36.0
In [9]:
@which det(A)
Out[9]:
det{T}(A::AbstractArray{T,2}) at linalg/generic.jl:574
In [13]:
det(Achol) # cheap
Out[13]:
36.0
In [14]:
@which det(Achol)
Out[14]:
det(C::Base.LinAlg.Cholesky) at linalg/cholesky.jl:424
In [9]:
inv(A) # this does LU!
Out[9]:
3×3 Array{Float64,2}:
  49.3611   -13.5556     2.11111 
 -13.5556     3.77778   -0.555556
   2.11111   -0.555556   0.111111
In [15]:
@which inv(A)
Out[15]:
inv{T}(A::Union{Base.ReshapedArray{T,2,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N<:Any}}},DenseArray{T,2},SubArray{T,2,A<:Union{Base.ReshapedArray{T<:Any,N<:Any,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N<:Any}}},DenseArray},I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex,Colon,Int64,Range{Int64}},N<:Any}},L<:Any}}) at linalg/dense.jl:357
In [10]:
inv(Achol)
Out[10]:
3×3 Array{Float64,2}:
  49.3611   -13.5556     2.11111 
 -13.5556     3.77778   -0.555556
   2.11111   -0.555556   0.111111
In [16]:
@which inv(Achol)
Out[16]:
inv{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},S<:Union{Base.ReshapedArray{T,2,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray{T,2},SubArray{T,2,A<:Union{Base.ReshapedArray{T,N,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray},I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex,Colon,Int64,Range{Int64}},N}},L}}}(C::Base.LinAlg.Cholesky{T,S}) at linalg/cholesky.jl:440
  • Example: positive semi-definite matrix.
In [11]:
srand(280)
A = randn(5, 3)
A = A * A'
Out[11]:
5×5 Array{Float64,2}:
  2.06704   -3.10361    2.41766    -0.717719   0.845433 
 -3.10361    9.76269   -7.31094     2.14335    0.283818 
  2.41766   -7.31094    6.0473     -0.931321  -0.0610487
 -0.717719   2.14335   -0.931321    1.28376    0.115462 
  0.845433   0.283818  -0.0610487   0.115462   0.827396 
In [12]:
Achol = cholfact(A)
Base.LinAlg.PosDefException(5)

 in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
 in cholfact!(::Hermitian{Float64,Array{Float64,2}}, ::Type{Val{false}}) at ./linalg/cholesky.jl:185
 in cholfact(::Array{Float64,2}, ::Symbol) at ./linalg/cholesky.jl:282
 in cholfact(::Array{Float64,2}) at ./linalg/cholesky.jl:281
In [13]:
Achol = cholfact(A, :L, Val{true})
Out[13]:
Base.LinAlg.CholeskyPivoted{Float64,Array{Float64,2}}([3.12453 -3.10361 … -0.717719 0.845433; -0.993306 1.03941 … 2.14335 0.283818; … ; -2.33985 0.0899256 … 5.96046e-8 0.115462; 0.0908354 0.900181 … -1.19908e-8 0.0],'L',[2,1,4,3,5],4,0.0,1)
In [14]:
Achol[:L]
Out[14]:
5×5 LowerTriangular{Float64,Array{Float64,2}}:
  3.12453      ⋅          ⋅           ⋅           ⋅ 
 -0.993306    1.03941     ⋅           ⋅           ⋅ 
  0.685974   -0.0349591  0.901097     ⋅           ⋅ 
 -2.33985     0.0899256  0.751198    5.96046e-8   ⋅ 
  0.0908354   0.900181   0.0939082  -1.19908e-8  0.0
In [15]:
Achol[:U]
Out[15]:
5×5 UpperTriangular{Float64,Array{Float64,2}}:
 3.12453  -0.993306   0.685974   -2.33985      0.0908354 
  ⋅        1.03941   -0.0349591   0.0899256    0.900181  
  ⋅         ⋅         0.901097    0.751198     0.0939082 
  ⋅         ⋅          ⋅          5.96046e-8  -1.19908e-8
  ⋅         ⋅          ⋅           ⋅           0.0       
In [16]:
Achol[:p]
Out[16]:
5-element Array{Int64,1}:
 2
 1
 4
 3
 5

Applications

  • No inversion mentality: Whenever we see matrix inverse, we should think in terms of solving linear equations. If the matrix is positive (semi)definite, use Cholesky decomposition.

  • Example: multivariate normal density $\text{MVN}(\mu, \Sigma)$, $\Sigma$ is p.d., is

$$ - \frac{n}{2} \log (2\pi) - \frac{1}{2} \log \det \Sigma - \frac{1}{2} (\mathbf{y} - \mu)^T \Sigma^{-1} (\mathbf{y} - \mu). $$
  • Method 1: (a) compute explicit inverse $\Sigma^{-1}$ ($2n^3$ flops), (b) compute quadratic form ($2n^2 + 2n$ flops), (c) compute determinant ($2n^3/3$ flops).

  • Method 2: (a) Cholesky decomposition $\Sigma = \mathbf{L} \mathbf{L}^T$ ($n^3/3$ flops), (b) Solve $\mathbf{L} \mathbf{x} = \mathbf{y} - \mu$ by forward substitutions ($n^2$ flops), (c) compute quadratic form $\mathbf{x}^T \mathbf{x}$ ($2n$ flops), and (d) compute determinant from Cholesky factor ($n$ flops).

Which method is better?

In [17]:
function logpdf_mvn_1(y::Vector, Σ::Matrix)
    n = length(y)
    - (n//2) * log(2π) - (1//2) * logdet(Σ) - (1//2) * dot(y, inv(Σ) * y)
end

function logpdf_mvn_2(y::Vector, Σ::Matrix)
    n = length(y)
    Σchol = cholfact(Σ)
    - (n//2) * log(2π) - (1//2) * logdet(Σchol) - (1//2) * sumabs2(Σchol[:L] \ y)
end
Out[17]:
logpdf_mvn_2 (generic function with 1 method)
In [18]:
using BenchmarkTools, Distributions

srand(280) # seed

n   = 1000
Σ   = randn(n, n); Σ = Σ' * Σ + I # covariance matrix
mvn = MvNormal(Σ) # N(0, Σ)
y   = rand(mvn) # one sample

@show logpdf_mvn_1(y, Σ)
@show logpdf_mvn_2(y, Σ)
logpdf_mvn_1(y,Σ) = -4396.173284372774
logpdf_mvn_2(y,Σ) = -4396.173284372773
Out[18]:
-4396.173284372773
In [19]:
@benchmark logpdf_mvn_1(y, Σ)
Out[19]:
BenchmarkTools.Trial: 
  memory estimate:  23.41 MiB
  allocs estimate:  21
  --------------
  minimum time:     69.049 ms (3.29% GC)
  median time:      82.971 ms (3.04% GC)
  mean time:        86.550 ms (3.11% GC)
  maximum time:     116.571 ms (3.62% GC)
  --------------
  samples:          58
  evals/sample:     1
In [20]:
@benchmark logpdf_mvn_2(y, Σ)
Out[20]:
BenchmarkTools.Trial: 
  memory estimate:  15.27 MiB
  allocs estimate:  17
  --------------
  minimum time:     12.320 ms (0.00% GC)
  median time:      14.644 ms (15.00% GC)
  mean time:        15.031 ms (12.50% GC)
  maximum time:     23.285 ms (14.75% GC)
  --------------
  samples:          332
  evals/sample:     1
  • Cholesky decomposition is one approach to solve linear regression.
    • Compute $\mathbf{X}^T \mathbf{X}$: $np^2$ flops
    • Compute $\mathbf{X}^T \mathbf{y}$: $2np$ flops
    • Cholesky decomposition of $\mathbf{X}^T \mathbf{X}$: $\frac{1}{3} p^3$ flops
    • Solve normal equation $\mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y}$: $2p^2$ flops
    • If need standard errors, another $(4/3)p^3$ flops

Total computational cost is $np^2 + (1/3) p^3$ (without s.e.) or $np^2 + (5/3) p^3$ (with s.e.) flops.

Further reading