Iterative Methods for Solving Linear Equations

Introduction

So far we have considered direct methods for solving linear equations.

  • Direct methods (flops fixed a priori) vs iterative methods:
    • Direct method (GE/LU, Cholesky, QR, SVD): good for dense, small to moderate sized $\mathbf{A}$
    • Iterative methods (Jacobi, Gauss-Seidal, SOR, conjugate-gradient, GMRES): good for large, sparse, structured linear system, parallel computing, warm start

PageRank problem

  • $\mathbf{A} \in \{0,1\}^{n \times n}$ the connectivity matrix of webpages with entries $$ \begin{eqnarray*} a_{ij} = \begin{cases} 1 & \text{if page $i$ links to page $j$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*} $$ $n \approx 10^9$ in May 2017.

  • $r_i = \sum_j a_{ij}$ is the out-degree of page $i$.

  • Larry Page imagines a random surfer wandering on internet according to following rules:

    • From a page $i$ with $r_i>0$
      • with probability $p$, (s)he randomly chooses a link on page $i$ (uniformly) and follows that link to the next page
      • with probability $1-p$, (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page
    • From a page $i$ with $r_i=0$ (a dangling page), (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page

The process defines a Markov chain on the space of $n$ pages. Stationary distribution of this Markov chain gives the ranks (probabilities) of each page.

  • Stationary distribution is the top left eigenvector of the transition matrix $\mathbf{P}$ corresponding to eigenvalue 1. Equivalently it can be cast as a linear equation. $$ (\mathbf{I} - \mathbf{P}^T) \mathbf{x} = \mathbf{0}. $$

  • Gene Golub: Largest matrix computation in world.

  • GE/LU will take $2 \times (10^9)^3/3/10^{12} \approx 6.66 \times 10^{14}$ seconds $\approx 20$ million years on a tera-flop supercomputer!

  • Iterative methods come to the rescue.

Jacobi method

Solve $\mathbf{A} \mathbf{x} = \mathbf{b}$.

  • Jacobi iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$

  • With splitting: $\mathbf{A} = \mathbf{L} + \mathbf{D} + \mathbf{U}$, Jacobi iteration can be written as $$ \mathbf{D} \mathbf{x}^{(t+1)} = - (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{b}, $$ i.e., $$ \mathbf{x}^{(t+1)} = - \mathbf{D}^{-1} (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b} = - \mathbf{D}^{-1} \mathbf{A} \mathbf{x}^{(t)} + \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b}. $$

  • One round costs $2n^2$ flops with an unstructured $\mathbf{A}$. Gain over GE/LU if converges in $o(n)$ iterations. Saving is huge for sparse or structured $\mathbf{A}$. By structured, we mean the matrix-vector multiplication $\mathbf{A} \mathbf{v}$ is fast ($O(n)$ or $O(n \log n)$).

Gauss-Seidel method

  • Gauss-Seidel (GS) iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$

  • With splitting, $(\mathbf{D} + \mathbf{L}) \mathbf{x}^{(t+1)} = - \mathbf{U} \mathbf{x}^{(t)} + \mathbf{b}$, i.e., $$ \mathbf{x}^{(t+1)} = - (\mathbf{D} + \mathbf{L})^{-1} \mathbf{U} \mathbf{x}^{(t)} + (\mathbf{D} + \mathbf{L})^{-1} \mathbf{b}. $$

  • GS converges for any $\mathbf{x}^{(0)}$ for symmetric and pd $\mathbf{A}$.

  • Convergence rate of Gauss-Seidel is the spectral radius of the $(\mathbf{D} + \mathbf{L})^{-1} \mathbf{U}$.

Successive over-relaxation (SOR)

  • SOR: $$ x_i^{(t+1)} = \omega \left( b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)} \right) / a_{ii} + (1-\omega) x_i^{(t)}, $$ i.e., $$ \mathbf{x}^{(t+1)} = (\mathbf{D} + \omega \mathbf{L})^{-1} [(1-\omega) \mathbf{D} - \omega \mathbf{U}] \mathbf{x}^{(t)} + (\mathbf{D} + \omega \mathbf{L})^{-1} (\mathbf{D} + \mathbf{L})^{-1} \omega \mathbf{b}. $$

  • Need to pick $\omega \in [0,1]$ beforehand, with the goal of improving convergence rate.

Conjugate gradient method

  • Conjugate gradient and its variants are the top-notch iterative methods for solving huge, structured linear systems.

  • A UCLA invention!

  • Solving $\mathbf{A} \mathbf{x} = \mathbf{b}$ is equivalent to minimizing the quadratic function $\frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T \mathbf{x}$.

  • To do later, when we study optimization.

Kershaw's results for a fusion problem.

Method Number of Iterations
Gauss Seidel 208,000
Block SOR methods 765
Incomplete Cholesky conjugate gradient 25

Numerical examples

The IterativeSolvers.jl package implements many commonly used iterative solvers.

In [1]:
using BenchmarkTools, IterativeSolvers

srand(280)

n = 4000
# a sparse pd matrix, about 0.5% non-zero entries
A = sprand(n, n, 0.002)
A = A + A' + (2n) * I
@show typeof(A)
b = randn(n)
# dense matrix representation of A
Afull = full(A)
@show typeof(Afull)
# actual sparsity level
countnz(A) / length(A)
typeof(A) = SparseMatrixCSC{Float64,Int64}
typeof(Afull) = Array{Float64,2}
Out[1]:
0.00425125
In [2]:
# dense matrix-vector multiplication
@benchmark Afull * b
Out[2]:
BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     5.618 ms (0.00% GC)
  median time:      7.455 ms (0.00% GC)
  mean time:        7.712 ms (0.00% GC)
  maximum time:     14.812 ms (0.00% GC)
  --------------
  samples:          646
  evals/sample:     1
In [3]:
# sparse matrix-vector multiplication
@benchmark A * b
Out[3]:
BenchmarkTools.Trial: 
  memory estimate:  31.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     66.647 μs (0.00% GC)
  median time:      69.379 μs (0.00% GC)
  mean time:        74.066 μs (2.24% GC)
  maximum time:     1.427 ms (92.48% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [4]:
# dense solve via Cholesky
xchol = cholfact(Afull) \ b
@benchmark cholfact(Afull) \ b
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  122.10 MiB
  allocs estimate:  11
  --------------
  minimum time:     277.813 ms (2.90% GC)
  median time:      279.444 ms (2.87% GC)
  mean time:        284.541 ms (3.83% GC)
  maximum time:     335.722 ms (17.85% GC)
  --------------
  samples:          18
  evals/sample:     1
In [5]:
# Jacobi solution is fairly close to Cholesky solution
xjacobi, = jacobi(A, b)
@show vecnorm(xjacobi - xchol)
vecnorm(xjacobi - xchol) = 0.009481824791564454
Out[5]:
0.009481824791564454
In [6]:
@benchmark jacobi(A, b)
Out[6]:
BenchmarkTools.Trial: 
  memory estimate:  94.36 KiB
  allocs estimate:  13
  --------------
  minimum time:     1.034 ms (0.00% GC)
  median time:      1.061 ms (0.00% GC)
  mean time:        1.100 ms (0.51% GC)
  maximum time:     3.343 ms (58.19% GC)
  --------------
  samples:          4537
  evals/sample:     1
In [7]:
# Gauss-Seidel solution is fairly close to Cholesky solution
xgs, = gauss_seidel(A, b)
vecnorm(xgs - xchol)
Out[7]:
0.009481824791564454
In [8]:
@benchmark gauss_seidel(A, b)
Out[8]:
BenchmarkTools.Trial: 
  memory estimate:  63.11 KiB
  allocs estimate:  13
  --------------
  minimum time:     1.116 ms (0.00% GC)
  median time:      1.215 ms (0.00% GC)
  mean time:        1.232 ms (0.26% GC)
  maximum time:     3.518 ms (65.23% GC)
  --------------
  samples:          4050
  evals/sample:     1
In [9]:
# symmetric SOR with ω=0.75
xsor, = ssor(A, b, 0.75)
vecnorm(xsor - xchol)
Out[9]:
0.009481824791561674
In [10]:
@benchmark sor(A, b, 0.75)
Out[10]:
BenchmarkTools.Trial: 
  memory estimate:  94.81 KiB
  allocs estimate:  38
  --------------
  minimum time:     1.123 ms (0.00% GC)
  median time:      1.165 ms (0.00% GC)
  mean time:        1.210 ms (0.39% GC)
  maximum time:     2.644 ms (55.17% GC)
  --------------
  samples:          4121
  evals/sample:     1
In [11]:
# conjugate gradient
xcg, = cg(A, b)
vecnorm(xcg - xchol)
Out[11]:
0.009481824791658372
In [12]:
@benchmark cg(A, b)
Out[12]:
BenchmarkTools.Trial: 
  memory estimate:  126.86 KiB
  allocs estimate:  31
  --------------
  minimum time:     229.348 μs (0.00% GC)
  median time:      237.998 μs (0.00% GC)
  mean time:        254.097 μs (3.40% GC)
  maximum time:     2.293 ms (81.10% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [13]:
versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)