Iterative Methods for Solving Linear Equations¶

Introduction¶

So far we have considered direct methods for solving linear equations.

Direct methods (flops fixed a priori) vs iterative methods:
- Direct method (GE/LU, Cholesky, QR, SVD): good for dense, small to moderate sized $\mathbf{A}$
- Iterative methods (Jacobi, Gauss-Seidal, SOR, conjugate-gradient, GMRES): good for large, sparse, structured linear system, parallel computing, warm start

PageRank problem¶

$\mathbf{A} \in \{0,1\}^{n \times n}$ the connectivity matrix of webpages with entries $$ \begin{eqnarray*} a_{ij} = \begin{cases} 1 & \text{if page $i$ links to page $j$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*} $$ $n \approx 10^9$ in May 2017.
$r_i = \sum_j a_{ij}$ is the out-degree of page $i$.
Larry Page imagines a random surfer wandering on internet according to following rules:
- From a page $i$ with $r_i>0$
  - with probability $p$, (s)he randomly chooses a link on page $i$ (uniformly) and follows that link to the next page
  - with probability $1-p$, (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page
- From a page $i$ with $r_i=0$ (a dangling page), (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page

The process defines a Markov chain on the space of $n$ pages. Stationary distribution of this Markov chain gives the ranks (probabilities) of each page.

Stationary distribution is the top left eigenvector of the transition matrix $\mathbf{P}$ corresponding to eigenvalue 1. Equivalently it can be cast as a linear equation. $$ (\mathbf{I} - \mathbf{P}^T) \mathbf{x} = \mathbf{0}. $$
Gene Golub: Largest matrix computation in world.
GE/LU will take $2 \times (10^9)^3/3/10^{12} \approx 6.66 \times 10^{14}$ seconds $\approx 20$ million years on a tera-flop supercomputer!
Iterative methods come to the rescue.

Jacobi method¶

Solve $\mathbf{A} \mathbf{x} = \mathbf{b}$.

Jacobi iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$
With splitting: $\mathbf{A} = \mathbf{L} + \mathbf{D} + \mathbf{U}$, Jacobi iteration can be written as $$ \mathbf{D} \mathbf{x}^{(t+1)} = - (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{b}, $$ i.e., $$ \mathbf{x}^{(t+1)} = - \mathbf{D}^{-1} (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b} = - \mathbf{D}^{-1} \mathbf{A} \mathbf{x}^{(t)} + \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b}. $$
One round costs $2n^2$ flops with an unstructured $\mathbf{A}$. Gain over GE/LU if converges in $o(n)$ iterations. Saving is huge for sparse or structured $\mathbf{A}$. By structured, we mean the matrix-vector multiplication $\mathbf{A} \mathbf{v}$ is fast ($O(n)$ or $O(n \log n)$).

Gauss-Seidel method¶

Gauss-Seidel (GS) iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$
With splitting, $(\mathbf{D} + \mathbf{L}) \mathbf{x}^{(t+1)} = - \mathbf{U} \mathbf{x}^{(t)} + \mathbf{b}$, i.e., $$ \mathbf{x}^{(t+1)} = - (\mathbf{D} + \mathbf{L})^{-1} \mathbf{U} \mathbf{x}^{(t)} + (\mathbf{D} + \mathbf{L})^{-1} \mathbf{b}. $$
GS converges for any $\mathbf{x}^{(0)}$ for symmetric and pd $\mathbf{A}$.
Convergence rate of Gauss-Seidel is the spectral radius of the $(\mathbf{D} + \mathbf{L})^{-1} \mathbf{U}$.

Successive over-relaxation (SOR)¶

SOR: $$ x_i^{(t+1)} = \omega \left( b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)} \right) / a_{ii} + (1-\omega) x_i^{(t)}, $$ i.e., $$ \mathbf{x}^{(t+1)} = (\mathbf{D} + \omega \mathbf{L})^{-1} [(1-\omega) \mathbf{D} - \omega \mathbf{U}] \mathbf{x}^{(t)} + (\mathbf{D} + \omega \mathbf{L})^{-1} (\mathbf{D} + \mathbf{L})^{-1} \omega \mathbf{b}. $$
Need to pick $\omega \in [0,1]$ beforehand, with the goal of improving convergence rate.

Conjugate gradient method¶

Conjugate gradient and its variants are the top-notch iterative methods for solving huge, structured linear systems.
A UCLA invention!
Solving $\mathbf{A} \mathbf{x} = \mathbf{b}$ is equivalent to minimizing the quadratic function $\frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T \mathbf{x}$.
To do later, when we study optimization.

Kershaw's results for a fusion problem.

Method	Number of Iterations
Gauss Seidel	208,000
Block SOR methods	765
Incomplete Cholesky conjugate gradient	25

Numerical examples¶

The IterativeSolvers.jl package implements many commonly used iterative solvers.

using BenchmarkTools, IterativeSolvers

srand(280)

n = 4000
# a sparse pd matrix, about 0.5% non-zero entries
A = sprand(n, n, 0.002)
A = A + A' + (2n) * I
@show typeof(A)
b = randn(n)
# dense matrix representation of A
Afull = full(A)
@show typeof(Afull)
# actual sparsity level
countnz(A) / length(A)

typeof(A) = SparseMatrixCSC{Float64,Int64}
typeof(Afull) = Array{Float64,2}

0.00425125

# dense matrix-vector multiplication
@benchmark Afull * b

BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     5.618 ms (0.00% GC)
  median time:      7.455 ms (0.00% GC)
  mean time:        7.712 ms (0.00% GC)
  maximum time:     14.812 ms (0.00% GC)
  --------------
  samples:          646
  evals/sample:     1

# sparse matrix-vector multiplication
@benchmark A * b

BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     66.647 μs (0.00% GC)
  median time:      69.379 μs (0.00% GC)
  mean time:        74.066 μs (2.24% GC)
  maximum time:     1.427 ms (92.48% GC)
  --------------
  samples:          10000
  evals/sample:     1

# dense solve via Cholesky
xchol = cholfact(Afull) \ b
@benchmark cholfact(Afull) \ b

BenchmarkTools.Trial: 
  memory estimate:  122.10 MiB
  allocs estimate:  11
  --------------
  minimum time:     277.813 ms (2.90% GC)
  median time:      279.444 ms (2.87% GC)
  mean time:        284.541 ms (3.83% GC)
  maximum time:     335.722 ms (17.85% GC)
  --------------
  samples:          18
  evals/sample:     1

# Jacobi solution is fairly close to Cholesky solution
xjacobi, = jacobi(A, b)
@show vecnorm(xjacobi - xchol)

vecnorm(xjacobi - xchol) = 0.009481824791564454

0.009481824791564454

@benchmark jacobi(A, b)

BenchmarkTools.Trial: 
  memory estimate:  94.36 KiB
  allocs estimate:  13
  --------------
  minimum time:     1.034 ms (0.00% GC)
  median time:      1.061 ms (0.00% GC)
  mean time:        1.100 ms (0.51% GC)
  maximum time:     3.343 ms (58.19% GC)
  --------------
  samples:          4537
  evals/sample:     1

# Gauss-Seidel solution is fairly close to Cholesky solution
xgs, = gauss_seidel(A, b)
vecnorm(xgs - xchol)

0.009481824791564454

@benchmark gauss_seidel(A, b)

BenchmarkTools.Trial: 
  memory estimate:  63.11 KiB
  allocs estimate:  13
  --------------
  minimum time:     1.116 ms (0.00% GC)
  median time:      1.215 ms (0.00% GC)
  mean time:        1.232 ms (0.26% GC)
  maximum time:     3.518 ms (65.23% GC)
  --------------
  samples:          4050
  evals/sample:     1

# symmetric SOR with ω=0.75
xsor, = ssor(A, b, 0.75)
vecnorm(xsor - xchol)

0.009481824791561674

@benchmark sor(A, b, 0.75)

BenchmarkTools.Trial: 
  memory estimate:  94.81 KiB
  allocs estimate:  38
  --------------
  minimum time:     1.123 ms (0.00% GC)
  median time:      1.165 ms (0.00% GC)
  mean time:        1.210 ms (0.39% GC)
  maximum time:     2.644 ms (55.17% GC)
  --------------
  samples:          4121
  evals/sample:     1

# conjugate gradient
xcg, = cg(A, b)
vecnorm(xcg - xchol)

0.009481824791658372

@benchmark cg(A, b)

BenchmarkTools.Trial: 
  memory estimate:  126.86 KiB
  allocs estimate:  31
  --------------
  minimum time:     229.348 μs (0.00% GC)
  median time:      237.998 μs (0.00% GC)
  mean time:        254.097 μs (3.40% GC)
  maximum time:     2.293 ms (81.10% GC)
  --------------
  samples:          10000
  evals/sample:     1

versioninfo()

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)