Brain Tumor Data
Here we estimate a generalized lasso model (sparse fused lasso) via the constrained lasso.
In this example, we use a version of the comparative genomic hybridization (CGH) data from Bredel et al. (2005) that was modified and studied by Tibshirani and Wang (2008).
The dataset here contains CGH measurements from 2 glioblastoma multiforme (GBM) brain tumors. Tibshirani and Wang (2008) proposed using the sparse fused lasso to approximate the CGH signal by a sparse, piecewise constant function in order to determine the areas with non-zero values, as positive (negative) CGH values correspond to possible gains (losses). The sparse fused lasso (Tibshirani et al., 2005) is given by
The sparse fused lasso is a special case of the generalized lasso with the penalty matrix. Therefore, the problem $(1)$ is equivalent to the following:
where
As discussed in Gaines, B.R. and Zhou, H., (2016), the sparse fused lasso can be reformulated and solved as a constrained lasso problem. (For details of the reformulation, see Section 2 of [3]). Here, we demonstrate solving the generalized lasso problem as constrained lasso, using genlasso.jl
in ConstrainedLasso package. Note that the performance of genlasso.jl
will be slow with large size problems because it involves the singular value decomposition (SVD).
using ConstrainedLasso
We load and organize the data first. Here, y
is the response vector. The design matrix X
is an identity matrix since the objective function in $(1)$ does not involve X
.
y = readdlm(joinpath(Pkg.dir("ConstrainedLasso"), "docs/src/demo/misc/tumor.txt"))
990×1 Array{Float64,2}: 0.333661 -0.152838 0.101485 -0.0342123 0.344761 0.151108 0.798318 0.282754 0.116233 -0.232173 -0.754577 1.06762 -0.017392 ⋮ -0.170825 -0.161826 -0.348987 -0.001227 -0.221422 0.552795 -0.603429 -0.447907 -0.317569 -0.728202 -0.505593 -0.147661
n = p = size(y, 1) X = eye(n)
990×990 Array{Float64,2}: 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋱ ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
First we create a penalty matrix D
.
D = [eye(p-1) zeros(p-1, 1)] - [zeros(p-1, 1) eye(p-1)]
989×990 Array{Float64,2}: 1.0 -1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋱ ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0
β̂path, ρpath = genlasso(X, y; D = D)
β̂path
990×989 Array{Float64,2}: -0.0178926 -0.0046696 0.118695 0.258754 … 0.333516 0.333644 -0.0178926 -0.0046696 0.118695 0.258754 -0.152548 -0.152803 -0.0178926 -0.0046696 0.118695 0.258754 0.101194 0.10145 -0.0178926 -0.0046696 0.118695 0.258754 -0.0339217 -0.0341772 -0.0178926 -0.0046696 0.118695 0.258754 0.34447 0.344726 -0.0178926 -0.0046696 0.118695 0.258754 … 0.151399 0.151143 -0.0178926 -0.0046696 0.118695 0.258754 0.798027 0.798283 -0.0178926 -0.0046696 0.118695 0.258754 0.282754 0.282754 -0.0178926 -0.0046696 0.118695 0.258754 0.116233 0.116233 -0.0178926 -0.0046696 0.118695 0.258754 -0.232173 -0.232173 -0.0178926 -0.0046696 0.118695 0.258754 … -0.754287 -0.754542 -0.0178926 -0.0046696 0.118695 0.258754 1.06733 1.06759 -0.0178926 -0.0046696 0.118695 0.258754 -0.0171014 -0.0173569 ⋮ ⋱ -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.170534 -0.17079 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.162116 -0.161861 -0.0178926 -0.0212612 -0.0504995 -0.0838215 … -0.348696 -0.348952 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.00151764 -0.00126215 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.221131 -0.221386 -0.0178926 -0.0212612 -0.0504995 -0.0838215 0.552504 0.55276 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.603138 -0.603394 -0.0178926 -0.0212612 -0.0504995 -0.0838215 … -0.447907 -0.447907 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.31786 -0.317604 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.727912 -0.728167 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.505593 -0.505593 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.147807 -0.147679
ρpath
989-element Array{Float64,1}: 138.641 135.983 112.914 86.7231 38.6934 33.6742 33.4223 29.0274 16.4677 14.8989 13.7994 11.5303 7.45317 ⋮ 0.00336078 0.00319116 0.00312939 0.0022499 0.00129357 0.00118782 0.000983542 0.000623544 0.000246343 0.00017132 0.000145322 0.0
We plot the constrained lasso solution path below.
using Plots; pyplot(); ρnewpath = ρpath[1:end-1] # exclude ρ=0 βnewpath = β̂path[:, 1:end-1] plot(log.(ρnewpath), βnewpath', label="", xaxis = ("log(ρ)"), yaxis = ("β̂(ρ)"), width=0.5) title!("Brain Tumor Data: Solution Path via Constrained Lasso")
savefig("misc/tumor1.svg");
Now let's compare our estimates with those from generalized lasso.
Variables lambda_path
and beta_path_fused
are lambda values and estimated beta coefficients, respectively, obtained from genlasso
package in R
.
lambda_path = readdlm(joinpath(Pkg.dir("ConstrainedLasso"),"docs/src/demo/misc/lambda_path.txt"))
989×1 Array{Float64,2}: 138.641 135.983 112.914 86.7231 38.6934 33.6742 33.4223 29.0274 16.4677 14.8989 13.7994 11.5303 7.45317 ⋮ 0.00336078 0.00319116 0.00312939 0.0022499 0.00129357 0.00118782 0.000983542 0.000623544 0.000246343 0.00017132 0.000145321 1.7577e-5
beta_path_fused = readdlm(joinpath(Pkg.dir("ConstrainedLasso"),"docs/src/demo/misc/beta_path_fused.txt"))[2:end, :]
990×989 Array{Float64,2}: -0.0178926 -0.0046696 0.118694 0.258754 … 0.333516 0.333644 -0.0178926 -0.0046696 0.118694 0.258754 -0.152548 -0.152803 -0.0178926 -0.0046696 0.118694 0.258754 0.101194 0.10145 -0.0178926 -0.0046696 0.118694 0.258754 -0.0339217 -0.0341772 -0.0178926 -0.0046696 0.118694 0.258754 0.344471 0.344726 -0.0178926 -0.0046696 0.118694 0.258754 … 0.151399 0.151143 -0.0178926 -0.0046696 0.118694 0.258754 0.798027 0.798283 -0.0178926 -0.0046696 0.118694 0.258754 0.282754 0.282754 -0.0178926 -0.0046696 0.118694 0.258754 0.116233 0.116233 -0.0178926 -0.0046696 0.118694 0.258754 -0.232173 -0.232173 -0.0178926 -0.0046696 0.118694 0.258754 … -0.754287 -0.754542 -0.0178926 -0.0046696 0.118694 0.258754 1.06733 1.06759 -0.0178926 -0.0046696 0.118694 0.258754 -0.0171014 -0.0173569 ⋮ ⋱ -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.170535 -0.17079 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.162116 -0.161861 -0.0178926 -0.0212612 -0.0504995 -0.0838215 … -0.348696 -0.348952 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.00151764 -0.00126215 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.221131 -0.221386 -0.0178926 -0.0212612 -0.0504995 -0.0838215 0.552504 0.55276 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.603138 -0.603394 -0.0178926 -0.0212612 -0.0504995 -0.0838215 … -0.447907 -0.447907 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.31786 -0.317604 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.727912 -0.728167 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.505593 -0.505593 -0.0178926 -0.0212612 -0.0504995 -0.0838215 -0.147807 -0.147679
The following figure plots generalized lasso solution path.
plot(log.(lambda_path), beta_path_fused', label="", xaxis = ("log(λ)"), yaxis = ("β̂(λ)"), width=0.5) title!("Brain Tumor Data: Generalized Lasso Solution Path")
savefig("misc/tumor2.svg")
Now we extract common values of $\rho$ and compare estimates at those values.
sameρ = intersect(round.(ρpath, 4), round.(lambda_path, 4)) sameρ_err = Float64[] for i in eachindex(sameρ) curρ = sameρ[i] idx1 = findmin(abs.(ρpath - curρ))[2] idx2 = findmin(abs.(lambda_path - curρ))[2] push!(sameρ_err, maximum(abs.(β̂path[:, idx1] - beta_path_fused[:, idx2]))) end sameρ_err
988-element Array{Any,1}: 1.22121e-9 2.47148e-9 4.33354e-9 4.3335e-9 3.64469e-8 4.41366e-8 3.35e-8 4.53911e-8 4.29568e-7 3.18004e-7 4.94452e-7 2.62003e-7 3.8775e-7 ⋮ 5.00008e-7 4.89495e-7 4.87999e-7 4.95e-7 4.78989e-7 4.86001e-7 4.96003e-7 4.99498e-7 4.86994e-7 4.86994e-7 4.89997e-7 4.83994e-7
Below are the mean, median, and maximum of the errors between estimated coefficients at common $\rho$ values.
println([mean(sameρ_err); median(sameρ_err); maximum(sameρ_err)])
[4.77914e-7, 4.80866e-7, 5.00013e-7]
Follow the link to access the .ipynb file of this page.