Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.
Don't blindly use A \ b
and inv
in Julia or solve
function in R. Don't waste computing resources by bad choices of algorithms!
Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal
type of Julia.
using BenchmarkTools
srand(280)
n = 1000
A = diagm(randn(n)) # a full matrix
b = randn(n)
@which A \ b
# check `istril(A)` and `istriu(A)`, then call `Diagonal(A) \ b`
@benchmark A \ b
# O(n) computation, no extra array allocation
@benchmark Diagonal(A) \ b
Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.
Bidiagonal
, Tridiagonal
, SymTridiagonal
types of Julia.srand(280)
n = 1000
dv = randn(n)
ev = randn(n - 1)
b = randn(n)
# symmetric tridiagonal matrix
A = SymTridiagonal(dv, ev)
# convert to a full matrix
Afull = full(A)
# LU decomposition (2/3) n^3 flops!
@benchmark Afull \ b
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark A \ b
Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.
srand(280)
n = 1000
A = tril(randn(n, n))
b = randn(n)
# check istril() then triangular solve
@benchmark A \ b
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular(A) \ b
srand(280)
B = 10 # number of blocks
ni = 100
A = blkdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}). \end{eqnarray*} $$ to avoid forming and doing computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.
Sparsity: sparse matrix decomposition or iterative method.
Matrix
package), MKL (sparse BLAS), ... as much as possible.srand(280)
n = 1000
# a sparse pd matrix, about 0.5% non-zero entries
A = sprand(n, n, 0.002)
A = A + A' + (2n) * I
b = randn(n)
Afull = full(A)
countnz(A) / length(A) # sparsity
# dense matrix-vector multiplication
@benchmark Afull * b
# sparse matrix-vector multiplication
@benchmark A * b
# dense Cholesky decomposition
@benchmark cholfact(Afull)
# sparse Cholesky decomposition
@benchmark cholfact(A)
# solve via dense Cholesky
xchol = cholfact(Afull) \ b
@benchmark cholfact(Afull) \ b
# solve via sparse Cholesky
xcholsp = cholfact(A) \ b
vecnorm(xchol - xcholsp)
@benchmark cholfact(A) \ b
# sparse solve via conjugate gradient
using IterativeSolvers
xcg, = cg(A, b)
vecnorm(xcg - xchol)
@benchmark cg(A, b)
Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula $$ (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1}, $$
WoodburyMatrices.jl
package can be useful.using BenchmarkTools, WoodburyMatrices
srand(280)
n = 1000
r = 5
A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Symmetric(full(W))
# solve via Cholesky
@benchmark cholfact(Wfull) \ b
# solve using Woodbury formula
@benchmark W \ b
# multiplication without using Woodbury structure
@benchmark Wfull * b
# multiplication using Woodbury structure
@benchmark W * b
Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$
Orthogonal $\mathbf{A}$: $n^2$ flops at most. Permutation matrix, Householder matrix, Jacobi matrix, ... take less.
Toeplitz systems: $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.
ToeplitzMatrices.jl
package can be useful.Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).
Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.
Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.
Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...
versioninfo()