versioninfo()
Assume A∈Rn×n is nonsingular and consider the system of linear equation Ax=b. The solution is x=A−1b. We want to know how the solution changes with a small perturbation of the input b (or A).
Let bnew=b+Δb. Then xnew=A−1(b+Δb)=x+Δx. Thus ‖ Because \mathbf{b} = \mathbf{A} \mathbf{x}, \frac{1}{\|\mathbf{x}\|} \le \|\mathbf{A}\| \frac{1}{\|\mathbf{b}\|}. This results \frac{ \|\Delta \mathbf{x}\|}{\|\mathbf{x}\|} \le \|\mathbf{A}\|\|\mathbf{A}^{-1}\| \frac{\|\Delta \mathbf{b}\|}{\|\mathbf{b}\|}.
\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\| is called the condition number for linear equation. It depends on the matrix norm being used.
Large condition number means "bad".
Some facts:
\begin{eqnarray*}
\kappa(\mathbf{A}) &=& \kappa(\mathbf{A}^{-1}) \\
\kappa(c\mathbf{A}) &=& \kappa(\mathbf{A}) \\
\kappa(\mathbf{A}) &\ge& 1 \\
%\kappa_1(\mathbf{A}) &=& \kappa_\infty (\mathbf{A}^T) \\
\kappa_2 (\mathbf{A}) &=& \kappa_2 (\mathbf{A}^T) = \frac{\sigma_1(\mathbf{A})}{\sigma_n(\mathbf{A})} \\
\kappa_2(\mathbf{A}^T \mathbf{A}) &=& \frac{\lambda_1(\mathbf{A}^T \mathbf{A})}{\lambda_n(\mathbf{A}^T \mathbf{A})} = \kappa_2^2(\mathbf{A}) \ge \kappa_2(\mathbf{A}).
\end{eqnarray*}
The last fact says that the condition number of \mathbf{A}^T \mathbf{A} can be much larger than that of \mathbf{A}.
The smallest singular value \sigma_n indicates the distance to the trouble.
Condition number for the least squares problem is more complicated. Roughly speaking,
Consider the simple case \begin{eqnarray*} \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 10^{-3} & 0 \\ 0 & 10^{-3} \end{pmatrix}. \end{eqnarray*} Forming normal equation yields a singular Gramian matrix \begin{eqnarray*} \mathbf{X}^T \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \end{eqnarray*} if executed with a precision of 6 decimal digits.
The Longley (1967) macroeconomic data set is a famous test example for numerical software in early dates.
using DelimitedFiles
# Base.download("http://hua-zhou.github.io/teaching/biostatm280-2019spring/slides/14-cond/longley.txt", "./longley.txt")
longley = readdlm("longley.txt")
using StatsPlots
gr()
corrplot(longley,
label = ["Employ" "Price" "GNP" "Jobless" "Military" "PopSize" "Year"])
# response: Employment
y = longley[:, 1]
# predictor matrix
X = [ones(length(y)) longley[:, 2:end]]
using LinearAlgebra
# Julia function for obtaining condition number
cond(X)
# we see the smallest singular value (aka trouble number) is very small
xsvals = svdvals(X)
# condition number of the design matrix
xcond = maximum(xsvals) / minimum(xsvals)
# X is full rank from SVD
xrksvd = rank(X)
# least squares from QR
X \ y
# Gram matrix
G = X'X
# rank of Gram matrix from SVD
# rank deficient! We lost precision when forming Gram matrix
rank(G)
svdvals(G)
# rank of Gram matrix from cholesky
# full!
gchol = cholesky(Symmetric(G), Val(true))
rank(gchol)
# least squares solution from Cholesky matches those from QR
gchol \ (X'y)
Xsp = Float32.(X)
ysp = Float32.(y)
# least squares by QR: dramatically different from Float64 solution
Xsp \ ysp
# least squares by Cholesky: failed
G = Xsp'Xsp
gchol = cholesky(Symmetric(G), Val(true), check=false)
gchol \ (Xsp'ysp)
rank(Xsp)
# rank of Gram matrix by Cholesky
rank(gchol)
# rank of Gram matrix by SVD
rank(G)
using StatsBase
# center and standardize matrix along dimension 1
Xcs = zscore(X[:, 2:end], 1)
Xcs = [ones(length(y)) Xcs]
cond(Xcs)
rank(Xcs)
rank(Xcs'Xcs)
Chapter 6 of Numerical Analysis for Statisticians of Kenneth Lange (2010).
Section 2.6 of Matrix Computation by Gene Golub and Charles Van Loan (2013).