Summary of linear regression

Methods for solving linear regression $\hat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:

Method Flops Remarks Software Stability
Sweep $np^2 + p^3$ $(X^TX)^{-1}$ available SAS less stable
Cholesky $np^2 + p^3/3$ less stable
QR by Householder $2np^2 - (2/3)p^3$ R stable
QR by MGS $2np^2$ $Q_1$ available stable
QR by SVD $4n^2p + 8np^2 + 9p^3$ $X = UDV^T$ most stable

Remarks:

  1. When $n \gg p$, sweep and Cholesky are twice faster than QR and need less space.
  2. Sweep and Cholesky are based on the Gram matrix $\mathbf{X}^T \mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.
  3. QR methods are more stable and produce numerically more accurate solution.
  4. Although sweep is slower than Cholesky, it yields standard errors and so on.
  5. MGS appears slower than Householder, but it yields $\mathbf{Q}_1$.
In [1]:
using SweepOperator, BenchmarkTools

function linreg_cholesky(y::Vector, X::Matrix)
    cholfact!(X' * X) \ (X' * y)
end

function linreg_qr(y::Vector, X::Matrix)
    X \ y
end

function linreg_sweep(y::Vector, X::Matrix)
    p = size(X, 2)
    tableau = [X y]' * [X y]
    sweep!(tableau, 1:p)
    return tableau[1:p, end]
end
Out[1]:
linreg_sweep (generic function with 1 method)
In [2]:
srand(280) # seed

n, p = 10, 3
X = randn(n, p)
y = randn(n)

@show linreg_cholesky(y, X)
@show linreg_qr(y, X)
@show linreg_sweep(y, X);
linreg_cholesky(y,X) = [0.390365,0.262759,0.149047]
linreg_qr(y,X) = [0.390365,0.262759,0.149047]
linreg_sweep(y,X) = [0.390365,0.262759,0.149047]
In [3]:
n, p = 1000, 300
X = randn(n, p)
y = randn(n)

@benchmark linreg_cholesky(y, X)
Out[3]:
BenchmarkTools.Trial: 
  memory estimate:  708.34 KiB
  allocs estimate:  10
  --------------
  minimum time:     2.251 ms (0.00% GC)
  median time:      2.436 ms (0.00% GC)
  mean time:        2.483 ms (1.35% GC)
  maximum time:     5.270 ms (35.50% GC)
  --------------
  samples:          1999
  evals/sample:     1
In [4]:
@benchmark linreg_sweep(y, X)
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  6.03 MiB
  allocs estimate:  622
  --------------
  minimum time:     8.401 ms (0.00% GC)
  median time:      8.951 ms (0.00% GC)
  mean time:        9.053 ms (3.37% GC)
  maximum time:     10.875 ms (0.00% GC)
  --------------
  samples:          550
  evals/sample:     1
In [5]:
@benchmark linreg_qr(y, X)
Out[5]:
BenchmarkTools.Trial: 
  memory estimate:  4.05 MiB
  allocs estimate:  2465
  --------------
  minimum time:     11.205 ms (0.00% GC)
  median time:      11.960 ms (0.00% GC)
  mean time:        14.063 ms (2.19% GC)
  maximum time:     43.242 ms (0.00% GC)
  --------------
  samples:          355
  evals/sample:     1

There is simply no such thing as a universal 'gold standard' when it comes to algorithms.