versioninfo()
Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.
Don't blindly use A \ b
and inv
in Julia or solve
function in R. Don't waste computing resources by bad choices of algorithms!
Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal
type of Julia.
using BenchmarkTools, LinearAlgebra, Random
# generate random data
Random.seed!(280)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b
Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.
Bidiagonal
, Tridiagonal
, SymTridiagonal
types of Julia.Random.seed!(280)
n = 1000
dv = randn(n)
ev = randn(n - 1)
b = randn(n) # rhs
# symmetric tridiagonal matrix
A = SymTridiagonal(dv, ev)
# convert to a full matrix
Afull = Matrix(A)
# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b
Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.
Random.seed!(280)
n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)
# check istril() then triangular solve
@benchmark $A \ $b
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b
Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).
Julia has a blockdiag
function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl
package?
using SparseArrays
Random.seed!(280)
B = 10 # number of blocks
ni = 100
A = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
using UnicodePlots
spy(A)
Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.
Anyone interested writing a package?
using MatrixDepot, LinearAlgebra
A = matrixdepot("lehmer", 50) # a pd matrix
B = matrixdepot("oscillate", 100) # pd matrix
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \
transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));
# relative error
norm(x1 - x2) / norm(x1)
using BenchmarkTools
# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron(A, B))) \ c
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric(A)) \
transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))))
Sparsity: sparse matrix decomposition or iterative method.
Matrix
package), MKL (sparse BLAS), ... as much as possible.using MatrixDepot
Random.seed!(280)
# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity
using UnicodePlots
spy(A)
# dense matrix-vector multiplication
@benchmark $Afull * $b
# sparse matrix-vector multiplication
@benchmark $A * $b
# solve via dense Cholesky
xchol = cholesky(Afull) \ b
@benchmark cholesky($Afull) \ $b
# solve via sparse Cholesky
xcholsp = cholesky(A) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($A) \ $b
# sparse solve via conjugate gradient
using IterativeSolvers
xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)
Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}
WoodburyMatrices.jl
package can be useful.using BenchmarkTools, Random, WoodburyMatrices
Random.seed!(280)
n = 1000
r = 5
A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Symmetric(Matrix(W)) # stored as a Matrix{Float64}
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)
# solve via Cholesky
@benchmark cholesky($Wfull) \ $b
# solve using Woodbury formula
@benchmark $W \ $b
# multiplication without using Woodbury structure
@benchmark $Wfull * $b
# multiplication using Woodbury structure
@benchmark $W * $b
# determinant without using Woodbury structure
@benchmark det($Wfull)
# determinant using Woodbury structure
@benchmark det($W)
Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?
Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.
Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.
ToeplitzMatrices.jl
package can be useful.Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).
Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.
Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.
Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...