Iterative Methods for Solving Linear Equations¶

Introduction¶

So far we have considered direct methods for solving linear equations.

Direct methods (flops fixed a priori) vs iterative methods:
- Direct method (GE/LU, Cholesky, QR, SVD): good for dense, small or moderate sized $\mathbf{A}$
- Iterative methods (Jacobi, Gauss-Seidal, SOR, conjugate-gradient, GMRES): good for large, sparse, structured linear system, parallel computing, warm start

PageRank problem (HW3)¶

$\mathbf{A} \in \{0,1\}^{n \times n}$ the connectivity matrix of webpages with entries $$ \begin{eqnarray*} a_{ij} = \begin{cases} 1 & \text{if page $i$ links to page $j$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*} $$ $n \approx 10^9$ in May 2017.
$r_i = \sum_j a_{ij}$ is the out-degree of page $i$.
Larry Page imagines a random surfer wandering on internet according to following rules:
- From a page $i$ with $r_i>0$
  - with probability $p$, (s)he randomly chooses a link on page $i$ (uniformly) and follows that link to the next page
  - with probability $1-p$, (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page
- From a page $i$ with $r_i=0$ (a dangling page), (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page

The process defines a Markov chain on the space of $n$ pages. Stationary distribution of this Markov chain gives the ranks (probabilities) of each page.

Stationary distribution is the top left eigenvector of the transition matrix $\mathbf{P}$ corresponding to eigenvalue 1. Equivalently it can be cast as a linear equation.
Gene Golub: Largest matrix computation in world.
GE/LU will take $2 \times (10^9)^3/3/10^{12} \approx 6.66 \times 10^{14}$ seconds $\approx 20$ million years on a tera-flop supercomputer!
Iterative methods come to the rescue.

Jacobi method¶

Solve $\mathbf{A} \mathbf{x} = \mathbf{b}$.

Jacobi iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$
With splitting: $\mathbf{A} = \mathbf{L} + \mathbf{D} + \mathbf{U}$, Jacobi iteration can be written as $$ \mathbf{D} \mathbf{x}^{(t+1)} = - (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{b}, $$ i.e., $$ \mathbf{x}^{(t+1)} = - \mathbf{D}^{-1} (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b} = - \mathbf{D}^{-1} \mathbf{A} \mathbf{x}^{(t)} + \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b}. $$
One round costs $2n^2$ flops with an unstructured $\mathbf{A}$. Gain over GE/LU if converges in $o(n)$ iterations. Saving is huge for sparse or structured $\mathbf{A}$. By structured, we mean the matrix-vector multiplication $\mathbf{A} \mathbf{v}$ is fast ($O(n)$ or $O(n \log n)$).

Gauss-Seidel method¶

Gauss-Seidel (GS) iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$
With splitting, $(\mathbf{D} + \mathbf{L}) \mathbf{x}^{(t+1)} = - \mathbf{U} \mathbf{x}^{(t)} + \mathbf{b}$, i.e., $$ \mathbf{x}^{(t+1)} = - (\mathbf{D} + \mathbf{L})^{-1} \mathbf{U} \mathbf{x}^{(t)} + (\mathbf{D} + \mathbf{L})^{-1} \mathbf{b}. $$
GS converges for any $\mathbf{x}^{(0)}$ for symmetric and pd $\mathbf{A}$.
Convergence rate of Gauss-Seidel is the spectral radius of the $(\mathbf{D} + \mathbf{L})^{-1} \mathbf{U}$.

Successive over-relaxation (SOR)¶

SOR: $$ x_i^{(t+1)} = \omega \left( b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)} \right) / a_{ii} + (1-\omega) x_i^{(t)}, $$ i.e., $$ \mathbf{x}^{(t+1)} = (\mathbf{D} + \omega \mathbf{L})^{-1} [(1-\omega) \mathbf{D} - \omega \mathbf{U}] \mathbf{x}^{(t)} + (\mathbf{D} + \omega \mathbf{L})^{-1} (\mathbf{D} + \mathbf{L})^{-1} \omega \mathbf{b}. $$
Need to pick $\omega \in [0,1]$ beforehand, with the goal of improving convergence rate.

Conjugate gradient method¶

Conjugate gradient and its variants are the top-notch iterative methods for solving huge, structured linear systems.
Solving $\mathbf{A} \mathbf{x} = \mathbf{b}$ is equivalent to minimizing the quadratic function $\frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T \mathbf{x}$.
To do later, when we study optimization.

Kershaw's results for a fusion problem.

Method	Number of Iterations
Gauss Seidel	208,000
Block SOR methods	765
Incomplete Cholesky conjugate gradient	25

Numerical examples¶

The IterativeSolvers.jl package implements many commonly used iterative solvers.

using BenchmarkTools, IterativeSolvers

srand(280)

n = 4000
# a sparse pd matrix, about 0.5% non-zero entries
A = sprand(n, n, 0.002)
A = A + A' + (2n) * I
@show typeof(A)
b = randn(n)
Afull = full(A)
@show typeof(Afull)
# actual sparsity level
countnz(A) / length(A)

typeof(A) = SparseMatrixCSC{Float64,Int64}
typeof(Afull) = Array{Float64,2}

0.00425125

# dense matrix-vector multiplication
@benchmark Afull * b

BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     5.310 ms (0.00% GC)
  median time:      7.455 ms (0.00% GC)
  mean time:        7.456 ms (0.00% GC)
  maximum time:     12.503 ms (0.00% GC)
  --------------
  samples:          664
  evals/sample:     1

# sparse matrix-vector multiplication
@benchmark A * b

BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     79.649 μs (0.00% GC)
  median time:      85.236 μs (0.00% GC)
  mean time:        96.287 μs (1.80% GC)
  maximum time:     2.318 ms (94.31% GC)
  --------------
  samples:          10000
  evals/sample:     1

# dense solve via Cholesky
xchol = cholfact(Afull) \ b
@benchmark cholfact(Afull) \ b

BenchmarkTools.Trial: 
  memory estimate:  122.10 MiB
  allocs estimate:  11
  --------------
  minimum time:     286.107 ms (4.42% GC)
  median time:      293.181 ms (4.39% GC)
  mean time:        308.517 ms (5.01% GC)
  maximum time:     377.636 ms (3.77% GC)
  --------------
  samples:          17
  evals/sample:     1

# Jacobi solution is fairly close to Cholesky solution
xjacobi, = jacobi(A, b)
@show vecnorm(xjacobi - xchol)

vecnorm(xjacobi - xchol) = 9.531929722637207e-10

9.531929722637207e-10

@benchmark jacobi(A, b)

BenchmarkTools.Trial: 
  memory estimate:  367.49 MiB
  allocs estimate:  32020
  --------------
  minimum time:     1.023 s (8.27% GC)
  median time:      1.084 s (8.39% GC)
  mean time:        1.100 s (9.56% GC)
  maximum time:     1.235 s (10.20% GC)
  --------------
  samples:          5
  evals/sample:     1

# Gauss-Seidel solution is fairly close to Cholesky solution
xgs, = gauss_seidel(A, b)
vecnorm(xgs - xchol)

2.7213775308664887e-10

@benchmark gauss_seidel(A, b)

BenchmarkTools.Trial: 
  memory estimate:  122.25 MiB
  allocs estimate:  20
  --------------
  minimum time:     835.558 ms (1.57% GC)
  median time:      887.137 ms (1.49% GC)
  mean time:        900.582 ms (2.45% GC)
  maximum time:     1.012 s (1.70% GC)
  --------------
  samples:          6
  evals/sample:     1

# symmetric SOR with ω=0.75
xsor, = ssor(A, b, 0.75)
vecnorm(xsor - xchol)

7.468068025961207e-9

@benchmark sor(A, b, 0.75)

BenchmarkTools.Trial: 
  memory estimate:  122.68 MiB
  allocs estimate:  49
  --------------
  minimum time:     3.998 s (2.44% GC)
  median time:      4.064 s (1.34% GC)
  mean time:        4.064 s (1.34% GC)
  maximum time:     4.129 s (0.28% GC)
  --------------
  samples:          2
  evals/sample:     1

# conjugate gradient
xcg, = cg(A, b)
vecnorm(xcg - xchol)

7.904644616768145e-17

@benchmark cg(A, b)

BenchmarkTools.Trial: 
  memory estimate:  846.56 KiB
  allocs estimate:  65
  --------------
  minimum time:     506.442 μs (0.00% GC)
  median time:      553.696 μs (0.00% GC)
  mean time:        683.236 μs (10.74% GC)
  maximum time:     3.829 ms (74.87% GC)
  --------------
  samples:          7246
  evals/sample:     1