In [1]:
versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code

A list of "easy" linear systems

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.

Don't blindly use A \ b and inv in Julia or solve function in R. Don't waste computing resources by bad choices of algorithms!

Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal type of Julia.

In [2]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(280)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);
In [3]:
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b
Out[3]:
\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/generic.jl:892
In [4]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     761.706 μs (0.00% GC)
  median time:      833.682 μs (0.00% GC)
  mean time:        872.122 μs (0.22% GC)
  maximum time:     9.602 ms (91.45% GC)
  --------------
  samples:          5665
  evals/sample:     1
In [5]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b
Out[5]:
BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     2.752 μs (0.00% GC)
  median time:      3.731 μs (0.00% GC)
  mean time:        5.059 μs (28.25% GC)
  maximum time:     4.938 ms (99.86% GC)
  --------------
  samples:          10000
  evals/sample:     9

Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.

In [6]:
Random.seed!(280) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)
Out[6]:
1000×1000 SymTridiagonal{Float64,Array{Float64,1}}:
 0.126238   0.618244    ⋅        …    ⋅          ⋅          ⋅      
 0.618244  -2.34688    1.10206        ⋅          ⋅          ⋅      
  ⋅         1.10206    1.91661        ⋅          ⋅          ⋅      
  ⋅          ⋅        -0.447244       ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
 ⋮                               ⋱                                 
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅            0.709662    ⋅          ⋅      
  ⋅          ⋅          ⋅            0.542815  -0.363838    ⋅      
  ⋅          ⋅          ⋅           -0.363838  -1.04034    0.321148
  ⋅          ⋅          ⋅             ⋅         0.321148  -0.276537
In [7]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b
Out[7]:
BenchmarkTools.Trial: 
  memory estimate:  7.65 MiB
  allocs estimate:  5
  --------------
  minimum time:     9.502 ms (0.00% GC)
  median time:      10.496 ms (0.00% GC)
  mean time:        10.635 ms (4.82% GC)
  maximum time:     47.393 ms (80.36% GC)
  --------------
  samples:          470
  evals/sample:     1
In [8]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b
Out[8]:
BenchmarkTools.Trial: 
  memory estimate:  23.86 KiB
  allocs estimate:  5
  --------------
  minimum time:     13.397 μs (0.00% GC)
  median time:      15.377 μs (0.00% GC)
  mean time:        22.577 μs (27.96% GC)
  maximum time:     42.611 ms (99.94% GC)
  --------------
  samples:          10000
  evals/sample:     1

Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [9]:
Random.seed!(280)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b
Out[9]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     707.663 μs (0.00% GC)
  median time:      768.855 μs (0.00% GC)
  mean time:        775.331 μs (0.00% GC)
  maximum time:     1.453 ms (0.00% GC)
  --------------
  samples:          6362
  evals/sample:     1
In [10]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b
Out[10]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     313.525 μs (0.00% GC)
  median time:      346.465 μs (0.00% GC)
  mean time:        375.613 μs (0.00% GC)
  maximum time:     1.111 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

In [11]:
using SparseArrays

Random.seed!(280)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
Out[11]:
1000×1000 SparseMatrixCSC{Float64,Int64} with 969 stored entries:
  [31  ,    1]  =  2.07834
  [53  ,    1]  =  -1.11883
  [58  ,    1]  =  -0.66448
  [14  ,    4]  =  1.11793
  [96  ,    5]  =  1.22813
  [81  ,    8]  =  -0.919643
  [48  ,    9]  =  1.0185
  [49  ,    9]  =  -0.996332
  [15  ,   10]  =  1.30841
  [28  ,   10]  =  -0.818757
  [39  ,   11]  =  1.08248
  [82  ,   11]  =  -0.0102294
  ⋮
  [935 ,  987]  =  0.677319
  [956 ,  987]  =  -0.900804
  [967 ,  987]  =  -0.438788
  [971 ,  991]  =  0.176756
  [929 ,  992]  =  -1.17384
  [974 ,  993]  =  1.59235
  [967 ,  994]  =  0.542169
  [994 ,  995]  =  0.627832
  [998 ,  997]  =  0.60382
  [935 ,  998]  =  0.342675
  [947 ,  998]  =  0.482228
  [975 , 1000]  =  0.991598
In [12]:
using UnicodePlots
spy(A)
Out[12]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   1000⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       1000
                         nz = 969

Kronecker product

Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

Anyone interested writing a package?

In [13]:
using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix
include group.jl for user defined matrix generators
verify download of index files...
used remote site is https://sparse.tamu.edu/?per_page=All
populating internal database...
Out[13]:
50×50 Array{Float64,2}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1 
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2 
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                            
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8 
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9 
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0 
In [14]:
B = matrixdepot("oscillate", 100) # pd matrix
Out[14]:
100×100 Array{Float64,2}:
  0.88678       0.117729     -0.0552891    …   1.52453e-18  -3.11225e-18
  0.117729      0.732177      0.227559        -4.47253e-18   9.13044e-18
 -0.0552891     0.227559      0.401882         9.04011e-18  -1.84549e-17
 -0.000257948  -0.00903794    0.0464366       -1.07207e-17   2.18857e-17
 -0.00361503    0.0096735    -0.0214526        3.98827e-17  -8.14185e-17
  0.00042184   -0.00124191    0.0024795    …  -1.12045e-16   2.28734e-16
 -8.70147e-5    0.000242067  -0.00049947       2.82936e-16  -5.77599e-16
  6.41788e-6   -9.33946e-5    0.000342935     -4.14338e-16   8.45851e-16
 -4.65439e-5    0.000107093  -0.000195109      4.39702e-16  -8.9763e-16 
 -2.08953e-6    7.84307e-7    1.28448e-5      -8.06945e-17   1.64734e-16
  4.92703e-7   -1.17028e-6    1.39914e-6   …   1.90854e-16  -3.89625e-16
 -6.70515e-7    1.93176e-6   -3.73725e-6      -1.01747e-15   2.07716e-15
  2.43687e-7   -7.52096e-7    1.61857e-6       7.99215e-16  -1.63158e-15
  ⋮                                        ⋱                            
  4.43802e-18  -1.30198e-17   2.63164e-17     -0.00372067    0.00622875 
 -2.63663e-18   7.73509e-18  -1.56346e-17      0.00322562   -0.00443629 
  3.01875e-18  -8.85614e-18   1.79005e-17  …  -0.00241141    0.00639837 
 -2.7251e-18    7.99463e-18  -1.61592e-17      0.00302995   -0.00631032 
  3.23187e-18  -9.48136e-18   1.91642e-17     -0.0042048     0.00769745 
 -2.97621e-18   8.73132e-18  -1.76482e-17      0.0045267    -0.00739848 
  2.79334e-18  -8.19483e-18   1.65638e-17     -0.00460457    0.00736033 
 -7.34735e-19   2.1555e-18   -4.3568e-18   …   0.00382355   -0.00836409 
  6.96272e-19  -2.04266e-18   4.12873e-18     -0.0394007     0.0187525  
 -1.59219e-18   4.67103e-18  -9.44134e-18      0.304724     -0.0667661  
  1.52453e-18  -4.47253e-18   9.04011e-18      0.645854      0.106467   
 -3.11225e-18   9.13044e-18  -1.84549e-17      0.106467      0.175967   
In [15]:
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;
In [16]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));
In [17]:
# relative error
norm(x1 - x2) / norm(x1)
Out[17]:
1.358949337519373e-7
In [18]:
using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron(A, B))) \ c
Out[18]:
BenchmarkTools.Trial: 
  memory estimate:  381.51 MiB
  allocs estimate:  11
  --------------
  minimum time:     630.446 ms (3.03% GC)
  median time:      711.411 ms (3.10% GC)
  mean time:        702.294 ms (5.36% GC)
  maximum time:     741.308 ms (2.74% GC)
  --------------
  samples:          8
  evals/sample:     1
In [19]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))))
Out[19]:
BenchmarkTools.Trial: 
  memory estimate:  176.59 KiB
  allocs estimate:  24
  --------------
  minimum time:     131.506 μs (0.00% GC)
  median time:      236.375 μs (0.00% GC)
  mean time:        273.470 μs (9.59% GC)
  maximum time:     6.839 ms (69.43% GC)
  --------------
  samples:          10000
  evals/sample:     1

Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.

  • The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), ... as much as possible.
In [20]:
using MatrixDepot

Random.seed!(280)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity
Out[20]:
0.001994776158751544
In [21]:
using UnicodePlots
spy(A)
Out[21]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   7701⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       7701
                        nz = 118301

Matrix-vector multiplication

In [22]:
# dense matrix-vector multiplication
@benchmark $Afull * $b
Out[22]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     22.048 ms (0.00% GC)
  median time:      25.870 ms (0.00% GC)
  mean time:        25.786 ms (0.00% GC)
  maximum time:     29.732 ms (0.00% GC)
  --------------
  samples:          194
  evals/sample:     1
In [23]:
# sparse matrix-vector multiplication
@benchmark $A * $b
Out[23]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     99.190 μs (0.00% GC)
  median time:      135.390 μs (0.00% GC)
  mean time:        150.205 μs (5.97% GC)
  maximum time:     4.153 ms (96.20% GC)
  --------------
  samples:          10000
  evals/sample:     1

Solve linear equation

In [24]:
# solve via dense Cholesky
xchol = cholesky(Afull) \ b
@benchmark cholesky($Afull) \ $b
Out[24]:
BenchmarkTools.Trial: 
  memory estimate:  452.52 MiB
  allocs estimate:  8
  --------------
  minimum time:     1.576 s (0.67% GC)
  median time:      1.631 s (1.28% GC)
  mean time:        1.624 s (1.85% GC)
  maximum time:     1.658 s (1.27% GC)
  --------------
  samples:          4
  evals/sample:     1
In [25]:
# solve via sparse Cholesky
xcholsp = cholesky(A) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($A) \ $b
norm(xchol - xcholsp) = 3.7385578057004605e-15
Out[25]:
BenchmarkTools.Trial: 
  memory estimate:  13.45 MiB
  allocs estimate:  55
  --------------
  minimum time:     15.046 ms (0.00% GC)
  median time:      17.306 ms (4.78% GC)
  mean time:        17.089 ms (2.95% GC)
  maximum time:     20.649 ms (4.04% GC)
  --------------
  samples:          293
  evals/sample:     1
In [26]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)
norm(xcg - xchol) = 2.1854385431265016e-7
Out[26]:
BenchmarkTools.Trial: 
  memory estimate:  302.20 KiB
  allocs estimate:  23
  --------------
  minimum time:     29.459 ms (0.00% GC)
  median time:      30.978 ms (0.00% GC)
  mean time:        31.294 ms (0.12% GC)
  maximum time:     41.731 ms (0.00% GC)
  --------------
  samples:          160
  evals/sample:     1

Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}

  • Keep HW2 Q2 (multivariate density) and PageRank problem in mind.
  • WoodburyMatrices.jl package can be useful.
In [27]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(280)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Symmetric(Matrix(W)) # stored as a Matrix{Float64}
Out[27]:
1000×1000 Symmetric{Float64,Array{Float64,2}}:
  1.8571     0.513107    0.872146   …   0.764278    -0.241331    0.54921 
  0.513107   4.57505    -0.636972      -1.86465     -1.92237    -1.72569 
  0.872146  -0.636972    4.81387        1.99357      1.99337     3.66327 
 -0.516414  -0.996711   -0.0919924      0.262832     0.612402    0.621834
  0.193686   1.68244    -0.770028      -0.723437    -1.4868     -1.32247 
  1.6567     0.0634435  -0.901968   …  -0.241872    -0.0356772  -0.39826 
  0.553996  -0.274515    2.21265        0.219437     2.20382     2.60902 
  0.402356   1.89288    -1.13032       -0.771441    -1.96862    -1.93483 
 -1.07744   -1.63881     1.78016        0.96551      1.7292      1.91326 
 -2.21617   -2.90695    -2.55971       -0.47867      0.855389   -0.933916
  1.29975    0.779828    4.12459    …   1.87358      0.737112    2.84136 
 -0.80833    1.44882     1.67581       -0.139063    -0.107873    0.818132
 -2.32469   -4.83109    -2.31796       -0.0346402    2.65564     0.591075
  ⋮                                 ⋱                                    
  0.334995   0.0914829   1.60026        0.0937123    1.40804     1.92919 
  0.390722   2.36157    -1.58383       -1.92435     -1.3291     -1.88114 
  1.19218    0.845478    1.9362     …  -0.0890571    1.36046     2.01013 
  0.915526   0.889395   -0.606443      -0.428052    -0.817555   -1.2017  
 -0.778472   2.1444      1.50696        0.00689644  -1.28104    -0.141234
  0.275366  -0.25866     0.416593       0.481534     0.0874531   0.316543
 -0.289749  -1.16146     0.550825       0.698152     0.789054    0.949917
  0.439213  -0.379216    1.0951     …   0.626399     0.624574    0.8568  
 -0.592534  -0.235847   -1.11586       -0.601497    -0.0651787  -0.573737
  0.764278  -1.86465     1.99357        3.21548      0.932773    1.97505 
 -0.241331  -1.92237     1.99337        0.932773     3.43991     2.8539  
  0.54921   -1.72569     3.66327        1.97505      2.8539      4.85234 
In [28]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)
Out[28]:
(48200, 8000056)

Solve linear equation

In [29]:
# solve via Cholesky
@benchmark cholesky($Wfull) \ $b
Out[29]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  7
  --------------
  minimum time:     6.344 ms (0.00% GC)
  median time:      6.739 ms (0.00% GC)
  mean time:        7.086 ms (8.22% GC)
  maximum time:     9.931 ms (17.51% GC)
  --------------
  samples:          705
  evals/sample:     1
In [30]:
# solve using Woodbury formula
@benchmark $W \ $b
Out[30]:
BenchmarkTools.Trial: 
  memory estimate:  75.45 KiB
  allocs estimate:  24
  --------------
  minimum time:     21.200 μs (0.00% GC)
  median time:      43.413 μs (0.00% GC)
  mean time:        52.796 μs (16.47% GC)
  maximum time:     3.121 ms (97.47% GC)
  --------------
  samples:          10000
  evals/sample:     1

Matrix-vector multiplication

In [31]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b
Out[31]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     33.573 μs (0.00% GC)
  median time:      37.166 μs (0.00% GC)
  mean time:        45.578 μs (0.00% GC)
  maximum time:     385.600 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [32]:
# multiplication using Woodbury structure
@benchmark $W * $b
Out[32]:
BenchmarkTools.Trial: 
  memory estimate:  24.06 KiB
  allocs estimate:  5
  --------------
  minimum time:     4.087 μs (0.00% GC)
  median time:      4.960 μs (0.00% GC)
  mean time:        7.489 μs (28.11% GC)
  maximum time:     391.177 μs (96.95% GC)
  --------------
  samples:          10000
  evals/sample:     7

Determinant

In [33]:
# determinant without using Woodbury structure
@benchmark det($Wfull)
Out[33]:
BenchmarkTools.Trial: 
  memory estimate:  8.13 MiB
  allocs estimate:  7
  --------------
  minimum time:     12.483 ms (0.00% GC)
  median time:      13.279 ms (0.00% GC)
  mean time:        13.585 ms (3.69% GC)
  maximum time:     18.074 ms (0.00% GC)
  --------------
  samples:          368
  evals/sample:     1
In [34]:
# determinant using Woodbury structure
@benchmark det($W)
Out[34]:
BenchmarkTools.Trial: 
  memory estimate:  138.05 KiB
  allocs estimate:  28
  --------------
  minimum time:     28.345 μs (0.00% GC)
  median time:      88.347 μs (0.00% GC)
  mean time:        109.560 μs (19.29% GC)
  maximum time:     4.632 ms (96.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?

Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

Toeplitz matrix

Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

Circulant matrix

Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

Cauchy-like matrix

Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...