Cholesky Decomposition¶

A basic tenet in numerical analysis:

The structure should be exploited whenever solving a problem.

Common structures include: symmetry, definiteness, sparsity, Kronecker product, low rank, ...

LU decomposition (Gaussian Elimination) is not used in statistics so often because most of time statisticians deal with positive (semi)definite matrix.
Consider solving the normal equation $$ \mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y} $$ for linear regression. The coefficient matrix $\mathbf{X}^T \mathbf{X}$ is symmetric and positive semidefinite. How to exploit this structure?

Cholesky decomposition¶

Theorem: Let $\mathbf{A} \in \mathbb{R}^{n \times n}$ be symmetric and positive definite. Then $\mathbf{A} = \mathbf{L} \mathbf{L}^T$, where $\mathbf{L}$ is lower triangular with positive diagonal entries and is unique.
Proof (by induction):
If $n=1$, then $\ell = \sqrt{a}$. For $n>1$, the block equation $$ \begin{eqnarray*} \begin{pmatrix} a_{11} & \mathbf{a}^T \\ \mathbf{a} & \mathbf{A}_{22} \end{pmatrix} = \begin{pmatrix} \ell_{11} & \mathbf{0}_{n-1}^T \\ \mathbf{l} & \mathbf{L}_{22} \end{pmatrix} \begin{pmatrix} \ell_{11} & \mathbf{l}^T \\ \mathbf{0}_{n-1} & \mathbf{L}_{22}^T \end{pmatrix} \end{eqnarray*} $$ has solution $$ \begin{eqnarray*} \ell_{11} &=& \sqrt{a_{11}} \\ \mathbf{l} &=& \ell_{11}^{-1} \mathbf{a} \\ \mathbf{L}_{22} \mathbf{L}_{22}^T &=& \mathbf{A}_{22} - \mathbf{l} \mathbf{l}^T = \mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}^T. \end{eqnarray*} $$
Now $a_{11}>0$ (why?), so $\ell_{11}$ and $\mathbf{l}$ are uniquely determined. $\mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}^T$ is positive definite because $\mathbf{A}$ is positive definite (why?). By induction hypothesis, $\mathbf{L}_{22}$ exists and is unique.
The constructive proof completely specifies the algorithm.

Computational cost: $$ \frac{1}{2} [2(n-1)^2 + 2(n-2)^2 + \cdots + 2 \cdot 1^2] \approx \frac{1}{3} n^3 \quad \text{flops} $$ plus $n$ square roots. Half the cost of LU decomposition by utilizing symmetry.
In general Cholesky decomposition is very stable. Failure of the decomposition simply means $\mathbf{A}$ is not positive definite. It is an efficient way to test positive definiteness.

Pivoting¶

When $\mathbf{A}$ does not have full rank, e.g., $\mathbf{X}^T \mathbf{X}$ with a non-full column rank $\mathbf{X}$, we encounter $a_{kk} = 0$ during the procedure.
Symmetric pivoting. At each stage $k$, we permute both row and column such that $\max_{k \le i \le n} a_{ii}$ becomes the pivot. If we encounter $\max_{k \le i \le n} a_{ii} = 0$, then $\mathbf{A}[k:n,k:n] = \mathbf{0}$ (why?) and the algorithm terminates.
With symmetric pivoting: $$ \mathbf{P} \mathbf{A} \mathbf{P}^T = \mathbf{L} \mathbf{L}^T, $$ where $\mathbf{P}$ is a permutation matrix and $\mathbf{L} \in \mathbb{R}^{n \times r}$, $r = \text{rank}(\mathbf{A})$.

Implementation¶

LAPACK functions: ?potrf (without pivoting), ?pstrf (with pivoting).
Julia functions: cholfact, cholfact!, chol, or call LAPACK wrapper functions potrf! and pstrf!
Example: positive definite matrix.

A = [4 12 -16; 12 37 -43; -16 -43 98]

3×3 Array{Int64,2}:
   4   12  -16
  12   37  -43
 -16  -43   98

# Cholesky without pivoting
Achol = cholfact(A)

Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
[2.0 6.0 -8.0; 0.0 1.0 5.0; 0.0 0.0 3.0]

Achol[:L]

3×3 LowerTriangular{Float64,Array{Float64,2}}:
  2.0   ⋅    ⋅ 
  6.0  1.0   ⋅ 
 -8.0  5.0  3.0

Achol[:U]

3×3 UpperTriangular{Float64,Array{Float64,2}}:
 2.0  6.0  -8.0
  ⋅   1.0   5.0
  ⋅    ⋅    3.0

b = [1.0; 2.0; 3.0]
A \ b # this does LU!

3-element Array{Float64,1}:
 28.5833 
 -7.66667
  1.33333

@which A \ b

Achol \ b # two triangular solves

3-element Array{Float64,1}:
 28.5833 
 -7.66667
  1.33333

@which Achol \ b

det(A) # this actually does LU!

36.0

@which det(A)

det(Achol) # cheap

36.0

@which det(Achol)

inv(A) # this does LU!

3×3 Array{Float64,2}:
  49.3611   -13.5556     2.11111 
 -13.5556     3.77778   -0.555556
   2.11111   -0.555556   0.111111

@which inv(A)

inv(Achol)

3×3 Array{Float64,2}:
  49.3611   -13.5556     2.11111 
 -13.5556     3.77778   -0.555556
   2.11111   -0.555556   0.111111

@which inv(Achol)

Example: positive semi-definite matrix.

srand(280)
A = randn(5, 3)
A = A * A'

5×5 Array{Float64,2}:
  2.06704   -3.10361    2.41766    -0.717719   0.845433 
 -3.10361    9.76269   -7.31094     2.14335    0.283818 
  2.41766   -7.31094    6.0473     -0.931321  -0.0610487
 -0.717719   2.14335   -0.931321    1.28376    0.115462 
  0.845433   0.283818  -0.0610487   0.115462   0.827396

Achol = cholfact(A)

Base.LinAlg.PosDefException(5)

 in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
 in cholfact!(::Hermitian{Float64,Array{Float64,2}}, ::Type{Val{false}}) at ./linalg/cholesky.jl:185
 in cholfact(::Array{Float64,2}, ::Symbol) at ./linalg/cholesky.jl:282
 in cholfact(::Array{Float64,2}) at ./linalg/cholesky.jl:281

Achol = cholfact(A, :L, Val{true})

Base.LinAlg.CholeskyPivoted{Float64,Array{Float64,2}}([3.12453 -3.10361 … -0.717719 0.845433; -0.993306 1.03941 … 2.14335 0.283818; … ; -2.33985 0.0899256 … 5.96046e-8 0.115462; 0.0908354 0.900181 … -1.19908e-8 0.0],'L',[2,1,4,3,5],4,0.0,1)

Achol[:L]

5×5 LowerTriangular{Float64,Array{Float64,2}}:
  3.12453      ⋅          ⋅           ⋅           ⋅ 
 -0.993306    1.03941     ⋅           ⋅           ⋅ 
  0.685974   -0.0349591  0.901097     ⋅           ⋅ 
 -2.33985     0.0899256  0.751198    5.96046e-8   ⋅ 
  0.0908354   0.900181   0.0939082  -1.19908e-8  0.0

Achol[:U]

5×5 UpperTriangular{Float64,Array{Float64,2}}:
 3.12453  -0.993306   0.685974   -2.33985      0.0908354 
  ⋅        1.03941   -0.0349591   0.0899256    0.900181  
  ⋅         ⋅         0.901097    0.751198     0.0939082 
  ⋅         ⋅          ⋅          5.96046e-8  -1.19908e-8
  ⋅         ⋅          ⋅           ⋅           0.0

Achol[:p]

5-element Array{Int64,1}:
 2
 1
 4
 3
 5

Applications¶

No inversion mentality: Whenever we see matrix inverse, we should think in terms of solving linear equations. If the matrix is positive (semi)definite, use Cholesky decomposition.
Example: multivariate normal density $\text{MVN}(\mu, \Sigma)$, $\Sigma$ is p.d., is

$$ - \frac{n}{2} \log (2\pi) - \frac{1}{2} \log \det \Sigma - \frac{1}{2} (\mathbf{y} - \mu)^T \Sigma^{-1} (\mathbf{y} - \mu). $$

Method 1: (a) compute explicit inverse $\Sigma^{-1}$ ($2n^3$ flops), (b) compute quadratic form ($2n^2 + 2n$ flops), (c) compute determinant ($2n^3/3$ flops).
Method 2: (a) Cholesky decomposition $\Sigma = \mathbf{L} \mathbf{L}^T$ ($n^3/3$ flops), (b) Solve $\mathbf{L} \mathbf{x} = \mathbf{y} - \mu$ by forward substitutions ($n^2$ flops), (c) compute quadratic form $\mathbf{x}^T \mathbf{x}$ ($2n$ flops), and (d) compute determinant from Cholesky factor ($n$ flops).

Which method is better?

function logpdf_mvn_1(y::Vector, Σ::Matrix)
    n = length(y)
    - (n//2) * log(2π) - (1//2) * logdet(Σ) - (1//2) * dot(y, inv(Σ) * y)
end

function logpdf_mvn_2(y::Vector, Σ::Matrix)
    n = length(y)
    Σchol = cholfact(Σ)
    - (n//2) * log(2π) - (1//2) * logdet(Σchol) - (1//2) * sumabs2(Σchol[:L] \ y)
end

logpdf_mvn_2 (generic function with 1 method)

using BenchmarkTools, Distributions

srand(280) # seed

n   = 1000
Σ   = randn(n, n); Σ = Σ' * Σ + I # covariance matrix
mvn = MvNormal(Σ) # N(0, Σ)
y   = rand(mvn) # one sample

@show logpdf_mvn_1(y, Σ)
@show logpdf_mvn_2(y, Σ)

logpdf_mvn_1(y,Σ) = -4396.173284372774
logpdf_mvn_2(y,Σ) = -4396.173284372773

-4396.173284372773

@benchmark logpdf_mvn_1(y, Σ)

BenchmarkTools.Trial: 
  memory estimate:  23.41 MiB
  allocs estimate:  21
  --------------
  minimum time:     69.049 ms (3.29% GC)
  median time:      82.971 ms (3.04% GC)
  mean time:        86.550 ms (3.11% GC)
  maximum time:     116.571 ms (3.62% GC)
  --------------
  samples:          58
  evals/sample:     1

@benchmark logpdf_mvn_2(y, Σ)

BenchmarkTools.Trial: 
  memory estimate:  15.27 MiB
  allocs estimate:  17
  --------------
  minimum time:     12.320 ms (0.00% GC)
  median time:      14.644 ms (15.00% GC)
  mean time:        15.031 ms (12.50% GC)
  maximum time:     23.285 ms (14.75% GC)
  --------------
  samples:          332
  evals/sample:     1

Cholesky decomposition is one approach to solve linear regression.
- Compute $\mathbf{X}^T \mathbf{X}$: $np^2$ flops
- Compute $\mathbf{X}^T \mathbf{y}$: $2np$ flops
- Cholesky decomposition of $\mathbf{X}^T \mathbf{X}$: $\frac{1}{3} p^3$ flops
- Solve normal equation $\mathbf{X}^T \mathbf{X} \beta = \mathbf{X}^T \mathbf{y}$: $2p^2$ flops
- If need standard errors, another $(4/3)p^3$ flops

Total computational cost is $np^2 + (1/3) p^3$ (without s.e.) or $np^2 + (5/3) p^3$ (with s.e.) flops.

Cholesky Decomposition¶

Cholesky decomposition¶

Pivoting¶

Implementation¶

Applications¶

Further reading¶