Condition Number

  • Assume $\mathbf{A} \in \mathbb{R}^{n \times n}$ is nonsingular and consider the system of linear equation $$ \mathbf{A} \mathbf{x} = \mathbf{b}. $$ The solution is $$ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. $$ We want to know how the solution changes with a small perturbation of the input $\mathbf{b}$ (or $\mathbf{A}$).

  • Let $$ \mathbf{b}_{\text{new}} = \mathbf{b} + \Delta \mathbf{b}. $$ Then $$ \mathbf{x}_{\text{new}} = \mathbf{A}^{-1} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{x} + \Delta \mathbf{x}. $$ Thus $$ \|\Delta \mathbf{x}\| = \|\mathbf{A}^{-1} \Delta \mathbf{b}\| \le \|\mathbf{A}^{-1}\| \|\Delta \mathbf{b}\|. $$ Because $\mathbf{b} = \mathbf{A} \mathbf{x}$, $$ \frac{1}{\|\mathbf{x}\|} \le \|\mathbf{A}\| \frac{1}{\|\mathbf{b}\|}. $$ This results $$ \frac{ \|\Delta \mathbf{x}\|}{\|\mathbf{x}\|} \le \|\mathbf{A}\|\|\mathbf{A}^{-1}\| \frac{\|\Delta \mathbf{b}\|}{\|\mathbf{b}\|}. $$

  • $\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\|$ is called the condition number for inversion. It depends on the matrix norm being used.

    • $\kappa_p$ means condition number defined from matrix-$p$ norm.
  • Large condition number means "bad".

  • Some facts:
    $$ \begin{eqnarray*} \kappa(\mathbf{A}) &=& \kappa(\mathbf{A}^{-1}) \\ \kappa(c\mathbf{A}) &=& \kappa(\mathbf{A}) \\ \kappa(\mathbf{A}) &\ge& 1 \\ %\kappa_1(\mathbf{A}) &=& \kappa_\infty (\mathbf{A}^T) \\ \kappa_2 (\mathbf{A}) &=& \kappa_2 (\mathbf{A}^T) = \frac{\sigma_1(\mathbf{A})}{\sigma_n(\mathbf{A})} \\ \kappa_2(\mathbf{A}^T \mathbf{A}) &=& \frac{\lambda_1(\mathbf{A}^T \mathbf{A})}{\lambda_n(\mathbf{A}^T \mathbf{A})} = \kappa_2^2(\mathbf{A}) \ge \kappa_2(\mathbf{A}). \end{eqnarray*} $$ The last fact says that the condition number of $\mathbf{A}^T \mathbf{A}$ can be much larger than that of $\mathbf{A}$.

  • The smallest singular value $\sigma_n$ indicates the distance to the trouble.

  • Condition number for the least squares problem is more complicated. Roughly speaking,

    • the method based on normal equation (Cholesky, sweep) has a condition depending on $[\kappa_2(\mathbf{X})]^2$
    • QR for a small residuals least squares problem has a condition depending on $\kappa_2(\mathbf{X})$
  • Consider the simple case $$ \begin{eqnarray*} \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 10^{-3} & 0 \\ 0 & 10^{-3} \end{pmatrix}. \end{eqnarray*} $$ Forming normal equation yields a singular Gramian matrix $$ \begin{eqnarray*} \mathbf{X}^T \mathbf{X} = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \end{eqnarray*} $$ if executed with a precision of 6 digits.

Implementation

  • Julia function cond computes $\kappa_p$ for $p=1$, 2 (default), or $\infty$.

  • Julia function condskeel computes the Skeel condition number.

Longley example

The Longley (1967) macroeconomic data set is a famous test example for numerical software in early dates.

In [1]:
longley = readdlm(download("http://hua-zhou.github.io/teaching" * 
        "/biostatm280-2017spring/hw/longley.txt"))
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1056  100  1056    0     0  13523      0 --:--:-- --:--:-- --:--:-- 14270
Out[1]:
16×7 Array{Float64,2}:
 60323.0   83.0  234289.0  2356.0  1590.0  107608.0  1947.0
 61122.0   88.5  259426.0  2325.0  1456.0  108632.0  1948.0
 60171.0   88.2  258054.0  3682.0  1616.0  109773.0  1949.0
 61187.0   89.5  284599.0  3351.0  1650.0  110929.0  1950.0
 63221.0   96.2  328975.0  2099.0  3099.0  112075.0  1951.0
 63639.0   98.1  346999.0  1932.0  3594.0  113270.0  1952.0
 64989.0   99.0  365385.0  1870.0  3547.0  115094.0  1953.0
 63761.0  100.0  363112.0  3578.0  3350.0  116219.0  1954.0
 66019.0  101.2  397469.0  2904.0  3048.0  117388.0  1955.0
 67857.0  104.6  419180.0  2822.0  2857.0  118734.0  1956.0
 68169.0  108.4  442769.0  2936.0  2798.0  120445.0  1957.0
 66513.0  110.8  444546.0  4681.0  2637.0  121950.0  1958.0
 68655.0  112.6  482704.0  3813.0  2552.0  123366.0  1959.0
 69564.0  114.2  502601.0  3931.0  2514.0  125368.0  1960.0
 69331.0  115.7  518173.0  4806.0  2572.0  127852.0  1961.0
 70551.0  116.9  554894.0  4007.0  2827.0  130081.0  1962.0
In [2]:
using StatPlots
#pyplot()
#plotly()
gr()

corrplot(longley, 
    label = ["Employ" "Price" "GNP" "Jobless" "Military" "PopSize" "Year"])
WARNING: Method definition describe(AbstractArray) in module StatsBase at /Users/huazhou/.julia/v0.5/StatsBase/src/scalarstats.jl:573 overwritten in module DataFrames at /Users/huazhou/.julia/v0.5/DataFrames/src/abstractdataframe/abstractdataframe.jl:407.
Out[2]:
0.0 0.5 1.0 1.5 2.0 Employ 90 100 110 Price 300000 400000 500000 GNP 2000 3000 4000 Jobless 2000 3000 Military 110000 120000 130000 PopSize 60000 65000 70000 1950 1955 1960 Employ Year 90 100 110 Price 300000 400000 500000 GNP 2000 3000 4000 Jobless 2000 3000 Military 110000 120000 130000 PopSize 1950 1955 1960 Year
In [3]:
# response: Employment
y = longley[:, 1]
# predictor matrix
X = [ones(length(y)) longley[:, 2:end]]
Out[3]:
16×7 Array{Float64,2}:
 1.0   83.0  234289.0  2356.0  1590.0  107608.0  1947.0
 1.0   88.5  259426.0  2325.0  1456.0  108632.0  1948.0
 1.0   88.2  258054.0  3682.0  1616.0  109773.0  1949.0
 1.0   89.5  284599.0  3351.0  1650.0  110929.0  1950.0
 1.0   96.2  328975.0  2099.0  3099.0  112075.0  1951.0
 1.0   98.1  346999.0  1932.0  3594.0  113270.0  1952.0
 1.0   99.0  365385.0  1870.0  3547.0  115094.0  1953.0
 1.0  100.0  363112.0  3578.0  3350.0  116219.0  1954.0
 1.0  101.2  397469.0  2904.0  3048.0  117388.0  1955.0
 1.0  104.6  419180.0  2822.0  2857.0  118734.0  1956.0
 1.0  108.4  442769.0  2936.0  2798.0  120445.0  1957.0
 1.0  110.8  444546.0  4681.0  2637.0  121950.0  1958.0
 1.0  112.6  482704.0  3813.0  2552.0  123366.0  1959.0
 1.0  114.2  502601.0  3931.0  2514.0  125368.0  1960.0
 1.0  115.7  518173.0  4806.0  2572.0  127852.0  1961.0
 1.0  116.9  554894.0  4007.0  2827.0  130081.0  1962.0
In [4]:
# Julia function for obtaining condition number
cond(X)
Out[4]:
4.859257015454868e9
In [5]:
# what's the implementation?
@which cond(X)
Out[5]:
cond(A::AbstractArray{T<:Any,2}) at linalg/dense.jl:528
In [6]:
# we see the smallest singular value (aka trouble number) is very small
xsvals = svdvals(X)
Out[6]:
7-element Array{Float64,1}:
     1.66367e6  
 83899.6        
  3407.2        
  1582.64       
    41.6936     
     3.64809    
     0.000342371
In [7]:
# condition number of the design matrix
xcond = maximum(xsvals) / minimum(xsvals)
Out[7]:
4.859257015454868e9
In [8]:
# X is full rank from SVD
xrksvd = rank(X)
Out[8]:
7
In [9]:
@which rank(X)
Out[9]:
rank(A::AbstractArray{T<:Any,2}) at linalg/generic.jl:308
In [10]:
# least squares from QR
X \ y
Out[10]:
7-element Array{Float64,1}:
   -3.48226e6
   15.0619   
   -0.0358192
   -2.02023  
   -1.03323  
   -0.0511041
 1829.15     
In [11]:
# Gram matrix
G = X' * X
Out[11]:
7×7 Array{Float64,2}:
    16.0        1626.9        6.20318e6   …  1.87878e6  31272.0       
  1626.9           1.67172e5  6.46701e8      1.9214e8       3.18054e6 
     6.20318e6     6.46701e8  2.55315e12     7.3868e11      1.21312e10
 51093.0           5.28908e6  2.06505e10     6.06649e9      9.99059e7 
 41707.0           4.29317e6  1.66329e10     4.92386e9      8.15371e7 
     1.87878e6     1.9214e8   7.3868e11   …  2.2134e11      3.67258e9 
 31272.0           3.18054e6  1.21312e10     3.67258e9      6.11215e7 
In [12]:
# rank of Gram matrix from SVD
# rank deficient!
rank(G)
Out[12]:
6
In [13]:
svdvals(G)
Out[13]:
7-element Array{Float64,1}:
    2.76779e12
    7.03914e9 
    1.1609e7  
    2.50476e6 
 1738.36      
   13.3086    
    1.16099e-7
In [14]:
# rank of Gram matrix from cholesky
# full!
gchol = cholfact(G, :L, Val{true})
rank(gchol)
Out[14]:
7
In [15]:
# least squares solution from Cholesky matches those from QR
gchol \ (X' * y)
Out[15]:
7-element Array{Float64,1}:
   -3.48226e6
   15.0619   
   -0.0358192
   -2.02023  
   -1.03323  
   -0.0511041
 1829.15     
  • Now let us re-run above example using single precision.
In [16]:
Xsp = convert(Matrix{Float32}, X)
ysp = convert(Vector{Float32}, y)

# least squares by QR
Xsp \ ysp
Out[16]:
7-element Array{Float32,1}:
   0.0237241
 -52.9953   
   0.0710731
  -0.42347  
  -0.572563 
  -0.414199 
  48.4177   
In [27]:
# least squares by Cholesky
G = Xsp' * Xsp
gchol = cholfact(G, :U, Val{true})
gchol \ (Xsp' * ysp)
Base.LinAlg.RankDeficientException(1)

 in chkfullrank at ./linalg/cholesky.jl:449 [inlined]
 in A_ldiv_B!(::Base.LinAlg.CholeskyPivoted{Float32,Array{Float32,2}}, ::Array{Float32,1}) at ./linalg/cholesky.jl:391
 in \(::Base.LinAlg.CholeskyPivoted{Float32,Array{Float32,2}}, ::Array{Float32,1}) at ./linalg/factorization.jl:39
In [28]:
rank(Xsp)
Out[28]:
6
In [29]:
# rank of Gram matrix by Cholesky
rank(gchol)
Out[29]:
6
In [30]:
# rank of Gram matrix by SVD
rank(G)
Out[30]:
4
  • Standardizing the predictors could improve the condition.
In [31]:
using StatsBase

# centerand standardize matrix along dimension 1
Xcs = zscore(X[:, 2:end], 1)
Xcs = [ones(length(y)) Xcs]
Out[31]:
16×7 Array{Float64,2}:
 1.0  -1.7311     -1.54343    -0.896035  -1.46093    -1.41114     -1.57532 
 1.0  -1.22144    -1.29053    -0.929209  -1.65348    -1.26393     -1.36527 
 1.0  -1.24924    -1.30434     0.52296   -1.42357    -1.0999      -1.15523 
 1.0  -1.12878    -1.03727     0.168746  -1.37471    -0.933713    -0.945189
 1.0  -0.50792    -0.590809   -1.17106    0.707427   -0.768965    -0.735147
 1.0  -0.331857   -0.409472   -1.34977    1.41872    -0.597174    -0.525105
 1.0  -0.248458   -0.224493   -1.41612    1.35118    -0.334958    -0.315063
 1.0  -0.155793   -0.247361    0.411666   1.0681     -0.173229    -0.105021
 1.0  -0.0445951   0.0983004  -0.309603   0.634143   -0.00517531   0.105021
 1.0   0.270466    0.316732   -0.397353   0.359686    0.188324     0.315063
 1.0   0.622593    0.554058   -0.275358   0.274906    0.434295     0.525105
 1.0   0.84499     0.571936    1.59202    0.0435575   0.650652     0.735147
 1.0   1.01179     0.955839    0.663147  -0.0785831   0.854214     0.945189
 1.0   1.16005     1.15602     0.789423  -0.133187    1.14202      1.15523 
 1.0   1.29905     1.31269     1.72579   -0.0498441   1.49912      1.36527 
 1.0   1.41025     1.68213     0.870753   0.316578    1.81955      1.57532 
In [32]:
cond(Xcs)
Out[32]:
110.54415344231359
In [33]:
rank(Xcs)
Out[33]:
7
In [34]:
rank(Xcs' * Xcs)
Out[34]:
7

Further reading