ADMM

In this section, we solve the same constrained lasso problem using the alternating direction method of multipliers (ADMM) algorithm. ADMM algorithm is advantageous since it can scale to larger size problems and is not restricted to linear constraints. See Gaines and Zhou (2016) for details.

In order to use the algorithm, user needs to supply a projection function onto the constraint set. Some simple examples are discussed below.

sum-to-zero constraint

We demonstrate using a sum-to-zero constraint example

For this constraint, the appropriate projection function for $\boldsymbol{x} = (x_1, \cdots, x_p)^T$ would be

First, let's define a true parameter β such that sum(β) = 0.

using ConstrainedLasso, Base.Test
n, p = 50, 100  
β = zeros(p);
β[1:round(Int, p / 4)] = 0
β[(round(Int, p / 4) + 1):round(Int, p / 2)] = 1
β[(round(Int, p / 2) + 1):round(Int, 3p / 4)] = 0
β[(round(Int, 3p / 4) + 1):p] = -1
β
100-element Array{Float64,1}:
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  ⋮  
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0

Next we generate data based on the true parameter β.

srand(41)
X = randn(n, p)
50×100 Array{Float64,2}:
  1.21212    -0.153889    0.141533  …  -0.458125    0.0951976  -2.14019   
  0.345895    1.30676     1.60944      -0.409901    0.323719    0.989333  
 -1.27859    -1.18894     0.512064      1.80509     1.62606    -1.44251   
  0.230616    2.54741    -0.523533      2.73358     1.07999     0.432834  
 -1.17103    -0.39082     0.441921     -0.179239   -0.158189   -0.640611  
  1.67135     0.0829011   0.964089  …  -0.720038    1.99359    -0.671572  
 -0.614717    2.16204    -0.0602       -0.324456   -0.616887    1.11243   
 -0.810535    0.974719   -0.045405      0.881578    1.29611     0.696869  
 -1.10879    -1.32489    -1.18272       0.579381   -0.971269   -0.687591  
 -0.219752   -0.447897   -0.974186     -0.880804   -0.480702   -1.36887   
  0.0952544  -0.126203   -0.273737  …  -0.264421    0.565684   -0.798719  
  1.4126      0.295896   -0.213161     -1.46343    -1.27144    -0.0589753 
 -0.418407   -0.479389    0.324243      1.96976     0.867659   -1.2999    
  ⋮                                 ⋱                                     
  0.504861   -1.03911    -0.357771      0.815027    0.919037    1.07463   
 -0.820358   -0.955319    0.097768      0.553219    1.56424     0.10535   
  1.39684     1.93183     0.706641  …  -0.0222014   0.987281   -0.0646814 
 -1.55206     0.446778    1.48206      -1.42384    -1.04209     0.0460478 
  0.928527    0.933087   -0.641975     -1.16347    -0.313851   -1.20434   
  0.380879   -0.144713    1.54374      -0.605637    0.408246    0.632131  
 -1.30233    -2.31664     1.51324       0.765034   -0.515553    0.984551  
  1.36747     1.34059    -0.114778  …   0.846682   -0.565511   -0.539113  
 -2.82496    -0.0447351   0.426242     -0.353497   -0.14583    -0.00304009
 -0.847741    1.49306     1.15522       0.637659    1.70818     0.641035  
 -0.22286    -0.43932    -0.373259      0.788337    0.223785   -0.343495  
  1.32145     0.104516   -0.993017     -0.272744   -0.133748    0.968627
y = X * β + randn(n)
50-element Array{Float64,1}:
  -9.90585 
  -5.40562 
   5.24289 
  -6.29951 
  -4.9586  
  -6.1342  
  -7.90981 
   2.51009 
  -5.79548 
   1.61355 
  -0.722766
  10.4522  
   4.03935 
   ⋮       
   0.397781
  -2.6661  
   5.36896 
  -3.56537 
  -2.402   
   0.11478 
  -5.39248 
   4.38391 
   0.706801
 -10.1066  
  -1.12558 
  14.2473

Now we estimate coefficients at fixed tuning parameter value using ADMM alogrithm.

ρ = 2.0
β̂admm = lsq_constrsparsereg_admm(X, y, ρ; proj = x -> x - mean(x))
β̂admm
100-element Array{Float64,1}:
  0.0     
  0.174344
  0.0     
 -0.421288
  0.0     
  0.0     
  0.0     
  0.0     
  0.324233
 -0.15384 
  0.0     
  0.0     
  0.0     
  ⋮       
 -0.397027
 -0.32079 
  0.0     
  0.0     
 -0.868508
 -0.992272
 -0.571755
  0.0     
 -1.16568 
  0.0     
  0.0     
  0.0

Now let's compare the estimated coefficients with those obtained using quadratic programming.

ρ = 2.0 
beq = [0.0]
Aeq = ones(1, p)
β̂, = lsq_constrsparsereg(X, y, ρ; Aeq = Aeq, beq = beq) 
hcat(β̂admm, β̂)
100×2 Array{Float64,2}:
  0.0        1.51451e-8 
  0.174344   0.178717   
  0.0       -1.02944e-9 
 -0.421288  -0.414043   
  0.0       -3.33221e-10
  0.0       -4.14188e-10
  0.0        3.28018e-11
  0.0        8.38037e-11
  0.324233   0.335375   
 -0.15384   -0.157908   
  0.0       -2.44739e-9 
  0.0       -8.79003e-10
  0.0       -8.17787e-10
  ⋮                     
 -0.397027  -0.391635   
 -0.32079   -0.33352    
  0.0       -5.51459e-9 
  0.0        1.09637e-9 
 -0.868508  -0.867639   
 -0.992272  -0.999583   
 -0.571755  -0.577743   
  0.0       -8.93601e-10
 -1.16568   -1.16862    
  0.0        2.29367e-9 
  0.0       -1.52035e-9 
  0.0        2.71896e-9

Non-negativity constraint

Here we look at the non-negativity constraint. For the non-negativity constraint

the appropriate projection function for $\boldsymbol{x} = (x_1, \cdots, x_p)^T$ is

Now let's generate X and y.

n, p = 50, 100   
β = zeros(p)
β[1:10] = 1:10
srand(41)
X = randn(n, p)
50×100 Array{Float64,2}:
  1.21212    -0.153889    0.141533  …  -0.458125    0.0951976  -2.14019   
  0.345895    1.30676     1.60944      -0.409901    0.323719    0.989333  
 -1.27859    -1.18894     0.512064      1.80509     1.62606    -1.44251   
  0.230616    2.54741    -0.523533      2.73358     1.07999     0.432834  
 -1.17103    -0.39082     0.441921     -0.179239   -0.158189   -0.640611  
  1.67135     0.0829011   0.964089  …  -0.720038    1.99359    -0.671572  
 -0.614717    2.16204    -0.0602       -0.324456   -0.616887    1.11243   
 -0.810535    0.974719   -0.045405      0.881578    1.29611     0.696869  
 -1.10879    -1.32489    -1.18272       0.579381   -0.971269   -0.687591  
 -0.219752   -0.447897   -0.974186     -0.880804   -0.480702   -1.36887   
  0.0952544  -0.126203   -0.273737  …  -0.264421    0.565684   -0.798719  
  1.4126      0.295896   -0.213161     -1.46343    -1.27144    -0.0589753 
 -0.418407   -0.479389    0.324243      1.96976     0.867659   -1.2999    
  ⋮                                 ⋱                                     
  0.504861   -1.03911    -0.357771      0.815027    0.919037    1.07463   
 -0.820358   -0.955319    0.097768      0.553219    1.56424     0.10535   
  1.39684     1.93183     0.706641  …  -0.0222014   0.987281   -0.0646814 
 -1.55206     0.446778    1.48206      -1.42384    -1.04209     0.0460478 
  0.928527    0.933087   -0.641975     -1.16347    -0.313851   -1.20434   
  0.380879   -0.144713    1.54374      -0.605637    0.408246    0.632131  
 -1.30233    -2.31664     1.51324       0.765034   -0.515553    0.984551  
  1.36747     1.34059    -0.114778  …   0.846682   -0.565511   -0.539113  
 -2.82496    -0.0447351   0.426242     -0.353497   -0.14583    -0.00304009
 -0.847741    1.49306     1.15522       0.637659    1.70818     0.641035  
 -0.22286    -0.43932    -0.373259      0.788337    0.223785   -0.343495  
  1.32145     0.104516   -0.993017     -0.272744   -0.133748    0.968627
y = X * β + randn(n)
50-element Array{Float64,1}:
  12.6173  
  40.3776  
   2.2169  
  27.4631  
  38.592   
   7.82023 
  22.7367  
   7.88475 
  -7.47037 
   0.621035
  -4.91899 
 -14.9363  
   8.26901 
   ⋮       
   7.83882 
  -9.30699 
 -29.7205  
  15.2482  
 -19.1784  
  14.9865  
   2.32728 
  -9.11988 
 -15.3472  
  22.9679  
  -0.997964
  42.6068

This time, we feed a sequence of tuning parameter values into the lsq_constrsparsereg_admm function.

ρ = linspace(1.0, 70.0, 10)
β̂admm = lsq_constrsparsereg_admm(X, y, ρ; proj = x -> clamp.(x, 0, Inf))
β̂admm
100×10 Array{Float64,2}:
  0.653843    0.490395    0.277812    …  0.0      0.0      0.0      0.0    
  2.21572     2.07051     1.96482        1.41607  1.26061  1.10519  0.94974
  2.77145     2.52934     2.36727        1.77875  1.64644  1.51417  1.38186
  4.13883     3.90272     3.70409        2.99234  2.82268  2.65297  2.4833 
  4.84074     4.56198     4.29935        3.31819  3.06715  2.81613  2.5651 
  5.8787      5.69235     5.49622     …  4.46134  4.18054  3.89969  3.61888
  6.7637      6.55971     6.34682        5.68324  5.54216  5.40096  5.25987
  8.51065     8.03277     7.8351         6.87417  6.61797  6.36169  6.10549
  8.7451      8.33258     8.02948        6.94937  6.68847  6.42749  6.16659
  9.77876     9.76788     9.6326         9.12581  9.02118  8.91651  8.81187
  1.65481e-6  2.65614e-5  5.75666e-5  …  0.0      0.0      0.0      0.0    
  0.0         0.0         0.0            0.0      0.0      0.0      0.0    
  0.0         0.0         0.0            0.0      0.0      0.0      0.0    
  ⋮                                   ⋱                                    
  0.170999    1.06392e-5  0.0            0.0      0.0      0.0      0.0    
  0.0         0.0         0.0            0.0      0.0      0.0      0.0    
  0.0         0.0         0.0         …  0.0      0.0      0.0      0.0    
 -5.9382e-7   0.0         0.0            0.0      0.0      0.0      0.0    
  0.0         2.57367e-5  3.98409e-5     0.0      0.0      0.0      0.0    
  0.0         0.0         0.0            0.0      0.0      0.0      0.0    
  0.0         1.69146e-5  0.0            0.0      0.0      0.0      0.0    
  0.0         0.0         0.0         …  0.0      0.0      0.0      0.0    
  0.0         0.0         3.45786e-5     0.0      0.0      0.0      0.0    
  0.0141037   0.0         0.0            0.0      0.0      0.0      0.0    
  0.0         2.39758e-5  3.62464e-5     0.0      0.0      0.0      0.0    
  0.0503145   0.0553623   0.0339917      0.0      0.0      0.0      0.0

Again we compare with the estimates from quadratic programming.

bineq = zeros(p)
Aineq = - eye(p)
β̂, = lsq_constrsparsereg(X, y, ρ; Aineq = Aineq, bineq = bineq) 
β̂
100×10 Array{Float64,2}:
  0.652004      0.489105      0.277863     …   2.00626e-9    8.58265e-9 
  2.2179        2.07061       1.9648           1.105         0.949582   
  2.77457       2.52826       2.36721          1.51391       1.38162    
  4.13788       3.9009        3.70426          2.65333       2.48359    
  4.84588       4.56136       4.29938          2.81596       2.56492    
  5.87904       5.69159       5.49619      …   3.89993       3.61905    
  6.75881       6.55723       6.34731          5.40177       5.26053    
  8.51202       8.03199       7.83528          6.36216       6.10583    
  8.74756       8.32936       8.02991          6.42804       6.16702    
  9.77692       9.76675       9.63278          8.91676       8.81211    
 -2.48584e-10  -2.61442e-10  -2.8615e-10   …  -9.64613e-11  -7.2806e-10 
 -1.30999e-10  -3.24647e-10  -3.6769e-10      -1.54375e-10  -8.53117e-10
 -1.68134e-10   3.14662e-10   6.39206e-10      6.38157e-10   4.18343e-9 
  ⋮                                        ⋱                            
  0.172384      5.34512e-9    2.48272e-9       1.24417e-9    7.18092e-9 
  8.77676e-9    2.44647e-9    2.84651e-9       2.01956e-9    1.09825e-8 
  6.75785e-9    2.44624e-10   4.79165e-10  …   2.79034e-10   1.6273e-9  
 -7.91733e-11   4.07951e-10   6.78723e-10      8.17503e-10   5.9523e-9  
 -1.49319e-10  -1.68014e-10  -6.97805e-11      2.31588e-10   1.52119e-9 
  4.53248e-10   5.12964e-10   1.57423e-9       3.93055e-9    3.06738e-8 
 -1.56445e-10   1.21443e-9    1.45145e-9       1.89669e-9    1.22688e-8 
  1.43742e-9    1.20569e-9    1.0833e-9    …   9.42889e-10   6.01223e-9 
  1.90626e-10  -1.41852e-10   3.12113e-11      1.8946e-10    1.17163e-9 
  0.0162573     9.37889e-10   8.29365e-10      7.05427e-10   4.91352e-9 
 -1.94284e-10   1.46358e-10   4.19363e-10      6.81323e-10   4.51758e-9 
  0.0522645     0.055839      0.0336149        3.45354e-9    1.57129e-8

As expected, estimated coefficient values are quite close.

Follow the link to access the .ipynb file of this page.