Assume $\mathbf{A} \in \mathbb{R}^{n \times n}$ is nonsingular and consider the system of linear equation $$ \mathbf{A} \mathbf{x} = \mathbf{b}. $$ The solution is $$ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. $$ We want to know how the solution changes with a small perturbation of the input $\mathbf{b}$ (or $\mathbf{A}$).
Let $$ \mathbf{b}_{\text{new}} = \mathbf{b} + \Delta \mathbf{b}. $$ Then $$ \mathbf{x}_{\text{new}} = \mathbf{A}^{-1} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{x} + \Delta \mathbf{x}. $$ Thus $$ \|\Delta \mathbf{x}\| = \|\mathbf{A}^{-1} \Delta \mathbf{b}\| \le \|\mathbf{A}^{-1}\| \|\Delta \mathbf{b}\|. $$ Because $\mathbf{b} = \mathbf{A} \mathbf{x}$, $$ \frac{1}{\|\mathbf{x}\|} \le \|\mathbf{A}\| \frac{1}{\|\mathbf{b}\|}. $$ This results $$ \frac{ \|\Delta \mathbf{x}\|}{\|\mathbf{x}\|} \le \|\mathbf{A}\|\|\mathbf{A}^{-1}\| \frac{\|\Delta \mathbf{b}\|}{\|\mathbf{b}\|}. $$
$\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\|$ is called the condition number for inversion. It depends on the matrix norm being used.
Large condition number means "bad".
Some facts:
$$
\begin{eqnarray*}
\kappa(\mathbf{A}) &=& \kappa(\mathbf{A}^{-1}) \\
\kappa(c\mathbf{A}) &=& \kappa(\mathbf{A}) \\
\kappa(\mathbf{A}) &\ge& 1 \\
%\kappa_1(\mathbf{A}) &=& \kappa_\infty (\mathbf{A}^T) \\
\kappa_2 (\mathbf{A}) &=& \kappa_2 (\mathbf{A}^T) = \frac{\sigma_1(\mathbf{A})}{\sigma_n(\mathbf{A})} \\
\kappa_2(\mathbf{A}^T \mathbf{A}) &=& \frac{\lambda_1(\mathbf{A}^T \mathbf{A})}{\lambda_n(\mathbf{A}^T \mathbf{A})} = \kappa_2^2(\mathbf{A}) \ge \kappa_2(\mathbf{A}).
\end{eqnarray*}
$$
The last fact says that the condition number of $\mathbf{A}^T \mathbf{A}$ can be much larger than that of $\mathbf{A}$.
The smallest singular value $\sigma_n$ indicates the distance to the trouble.
Condition number for the least squares problem is more complicated. Roughly speaking,
Consider the simple case $$ \begin{eqnarray*} \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 10^{-3} & 0 \\ 0 & 10^{-3} \end{pmatrix}. \end{eqnarray*} $$ Forming normal equation yields a singular Gramian matrix $$ \begin{eqnarray*} \mathbf{X}^T \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \end{eqnarray*} $$ if executed with a precision of 6 digits.
The Longley (1967) macroeconomic data set is a famous test example for numerical software in early dates.
longley = readdlm(download("http://hua-zhou.github.io/teaching" *
"/biostatm280-2017spring/hw/longley.txt"))
using StatPlots
#pyplot()
#plotly()
gr()
corrplot(longley,
label = ["Employ" "Price" "GNP" "Jobless" "Military" "PopSize" "Year"])
# response: Employment
y = longley[:, 1]
# predictor matrix
X = [ones(length(y)) longley[:, 2:end]]
# Julia function for obtaining condition number
cond(X)
# what's the implementation?
@which cond(X)
# we see the smallest singular value (aka trouble number) is very small
xsvals = svdvals(X)
# condition number of the design matrix
xcond = maximum(xsvals) / minimum(xsvals)
# X is full rank from SVD
xrksvd = rank(X)
@which rank(X)
# least squares from QR
X \ y
# Gram matrix
G = X' * X
# rank of Gram matrix from SVD
# rank deficient!
rank(G)
svdvals(G)
# rank of Gram matrix from cholesky
# full!
gchol = cholfact(G, :L, Val{true})
rank(gchol)
# least squares solution from Cholesky matches those from QR
gchol \ (X' * y)
Xsp = convert(Matrix{Float32}, X)
ysp = convert(Vector{Float32}, y)
# least squares by QR
Xsp \ ysp
# least squares by Cholesky
G = Xsp' * Xsp
gchol = cholfact(G, :U, Val{true})
gchol \ (Xsp' * ysp)
rank(Xsp)
# rank of Gram matrix by Cholesky
rank(gchol)
# rank of Gram matrix by SVD
rank(G)
using StatsBase
# centerand standardize matrix along dimension 1
Xcs = zscore(X[:, 2:end], 1)
Xcs = [ones(length(y)) Xcs]
cond(Xcs)
rank(Xcs)
rank(Xcs' * Xcs)
Chapter 6 of Numerical Analysis for Statisticians of Kenneth Lange (2010).
Section 2.6 of Matrix Computation by Gene Golub and Charles Van Loan (2013).