Table of Contents¶

1 Optimization Examples - Linear Programming

1.1 Linear programming (LP)

1.2 LP example: compressed sensing

1.2.1 Generate a sparse signal and sub-sampling

1.2.2 Solve LP by calling Gurobi directly (not recommended)

1.2.3 Solve LP by DCP (disciplined convex programming) interface Convex.jl

1.3 LP example: quantile regression

1.4 LP Example: $\ell_1$ regression

1.5 LP Example: $\ell_\infty$ regression (Chebychev approximation)

1.6 LP Example: Dantzig selector

1.7 LP Example: 1-norm SVM

versioninfo()

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code

Optimization Examples - Linear Programming¶

Linear programming (LP)¶

A general linear program takes the form \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b} \\ & & \mathbf{G} \mathbf{x} \preceq \mathbf{h}. \end{eqnarray*} Linear program is a convex optimization problem, why?

The standard form of an LP is \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b} \\ & & \mathbf{x} \succeq \mathbf{0}. \end{eqnarray*} To transform a general linear program into the standard form, we introduce the slack variables $\mathbf{s} \succeq \mathbf{0}$ such that $\mathbf{G} \mathbf{x} + \mathbf{s} = \mathbf{h}$. Then we write $\mathbf{x} = \mathbf{x}^+ - \mathbf{x}^-$, where $\mathbf{x}^+ \succeq \mathbf{0}$ and $\mathbf{x}^- \succeq \mathbf{0}$. This yields the problem \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T (\mathbf{x}^+ - \mathbf{x}^-) \\ &\text{subject to}& \mathbf{A} (\mathbf{x}^+ - \mathbf{x}^-) = \mathbf{b} \\ & & \mathbf{G} (\mathbf{x}^+ - \mathbf{x}^-) + \mathbf{s} = \mathbf{h} \\ & & \mathbf{x}^+ \succeq \mathbf{0}, \mathbf{x}^- \succeq \mathbf{0}, \mathbf{s} \succeq \mathbf{0} \end{eqnarray*} in $\mathbf{x}^+$, $\mathbf{x}^-$, and $\mathbf{s}$.

Slack variables are often used to transform a complicated inequality constraint to simple non-negativity constraints.
The inequality form of an LP is \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathbf{G} \mathbf{x} \preceq \mathbf{h}. \end{eqnarray*}
Some softwares, e.g., solveLP in R, require an LP be written in either standard or inequality form. However a good software should do this for you!
A piecewise-linear minimization problem \begin{eqnarray*} &\text{minimize}& \max_{i=1,\ldots,m} (\mathbf{a}_i^T \mathbf{x} + b_i) \end{eqnarray*} can be transformed to an LP \begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& \mathbf{a}_i^T \mathbf{x} + b_i \le t, \quad i = 1,\ldots,m, \end{eqnarray*} in $\mathbf{x}$ and $t$. Apparently $$ \text{minimize} \max_{i=1,\ldots,m} |\mathbf{a}_i^T \mathbf{x} + b_i| $$ and $$ \text{minimize} \max_{i=1,\ldots,m} (\mathbf{a}_i^T \mathbf{x} + b_i)_+ $$ are also LP.
Any convex optimization problem \begin{eqnarray*} &\text{minimize}& f_0(\mathbf{x}) \\ &\text{subject to}& f_i(\mathbf{x}) \le 0, \quad i=1,\ldots,m \\ && \mathbf{a}_i^T \mathbf{x} = b_i, \quad i=1,\ldots,p, \end{eqnarray*} where $f_0,\ldots,f_m$ are convex functions, can be transformed to the epigraph form \begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& f_0(\mathbf{x}) - t \le 0 \\ & & f_i(\mathbf{x}) \le 0, \quad i=1,\ldots,m \\ & & \mathbf{a}_i^T \mathbf{x} = b_i, \quad i=1,\ldots,p \end{eqnarray*} in variables $\mathbf{x}$ and $t$. That is why people often say linear program is universal.
The linear fractional programming \begin{eqnarray*} &\text{minimize}& \frac{\mathbf{c}^T \mathbf{x} + d}{\mathbf{e}^T \mathbf{x} + f} \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b} \\ & & \mathbf{G} \mathbf{x} \preceq \mathbf{h} \\ & & \mathbf{e}^T \mathbf{x} + f > 0 \end{eqnarray*} can be transformed to an LP \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{y} + d z \\ &\text{subject to}& \mathbf{G} \mathbf{y} - z \mathbf{h} \preceq \mathbf{0} \\ & & \mathbf{A} \mathbf{y} - z \mathbf{b} = \mathbf{0} \\ & & \mathbf{e}^T \mathbf{y} + f z = 1 \\ & & z \ge 0 \end{eqnarray*} in $\mathbf{y}$ and $z$, via transformation of variables \begin{eqnarray*} \mathbf{y} = \frac{\mathbf{x}}{\mathbf{e}^T \mathbf{x} + f}, \quad z = \frac{1}{\mathbf{e}^T \mathbf{x} + f}. \end{eqnarray*} See Section 4.3.2 of Boyd and Vandenberghe (2004) for proof.

LP example: compressed sensing¶

Compressed sensing Candes and Tao (2006) and Donoho (2006) tries to address a fundamental question: how to compress and transmit a complex signal (e.g., musical clips, mega-pixel images), which can be decoded to recover the original signal?

Suppose a signal $\mathbf{x} \in \mathbb{R}^n$ is sparse with $s$ non-zeros. We under-sample the signal by multiplying a (flat) measurement matrix $\mathbf{y} = \mathbf{A} \mathbf{x}$, where $\mathbf{A} \in \mathbb{R}^{m\times n}$ has iid normal entries. Candes, Romberg and Tao (2006) show that the solution to \begin{eqnarray*} &\text{minimize}& \|\mathbf{x}\|_1 \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{y} \end{eqnarray*} exactly recovers the true signal under certain conditions on $\mathbf{A}$ when $n \gg s$ and $m \approx s \ln(n/s)$. Why sparsity is a reasonable assumption? Virtually all real-world images have low information content.

The $\ell_1$ minimization problem apparently is an LP, by writing $\mathbf{x} = \mathbf{x}^+ - \mathbf{x}^-$, \begin{eqnarray*} &\text{minimize}& \mathbf{1}^T (\mathbf{x}^+ + \mathbf{x}^-) \\ &\text{subject to}& \mathbf{A} (\mathbf{x}^+ - \mathbf{x}^-) = \mathbf{y} \\ & & \mathbf{x}^+ \succeq \mathbf{0}, \mathbf{x}^- \succeq \mathbf{0}. \end{eqnarray*}
Let's try a numerical example.

Generate a sparse signal and sub-sampling¶

using Gadfly, Random

# random seed
Random.seed!(280)
# Size of signal
n = 1024
# Sparsity (# nonzeros) in the signal
s = 20
# Number of samples (undersample by a factor of 8) 
m = 128

# Generate and display the signal
x0 = zeros(n)
x0[rand(1:n, s)] = randn(s)
# Generate the random sampling matrix
A = randn(m, n) / m
# Subsample by multiplexing
y = A * x0

# plot the true signal
plot(x=1:n, y=x0, Geom.line, Guide.title("True signal x_0"))

Solve LP by calling Gurobi directly (not recommended)¶

Gurobi model formulation:
\begin{eqnarray*} &\text{minimize}& (1/2) x' H x + f' x \\ &s.t.& A x \le b \\ & & Aeq \cdot x = beq \\ & & lb <= x <= ub \end{eqnarray*} Refer to Gurobi.jl documentation for setting up the model.

using Gurobi
env = Gurobi.Env()
setparams!(env, OutputFlag=1) # display log

# Construct the model
model = gurobi_model(env;
    name = "cs",
    f = ones(2 * n),
    Aeq = [A -A],
    beq = y,
    lb = zeros(2 * n))

# Run optimization
optimize(model)

# Show results
sol = get_solution(model)
xsol = sol[1:n] - sol[n + 1:end]

plot(x=1:n, y=x0, Geom.point)
plot(x=1:n, y=xsol, Geom.line, Guide.title("Reconstructed signal overlayed with x0"))

Academic license - for non-commercial use only
Optimize a model with 128 rows, 2048 columns and 262144 nonzeros
Coefficient statistics:
  Matrix range     [1e-08, 3e-02]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e-04, 1e-01]

Concurrent LP optimizer: dual simplex and barrier
Showing barrier log only...

Presolve time: 0.10s
Presolved: 128 rows, 2048 columns, 262144 nonzeros

Ordering time: 0.00s

Barrier statistics:
 AA' NZ     : 8.128e+03
 Factor NZ  : 8.256e+03 (roughly 1 MByte of memory)
 Factor Ops : 7.073e+05 (less than 1 second per iteration)
 Threads    : 3

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   2.51327671e+03  0.00000000e+00  1.40e-14 0.00e+00  2.08e+00     0s
   1   4.36716581e+02  2.93300751e+00  1.90e-14 2.22e-15  2.12e-01     0s

Barrier performed 1 iterations in 0.16 seconds
Barrier solve interrupted - model solved by another algorithm


Solved with dual simplex
Solved in 161 iterations and 0.16 seconds
Optimal objective  1.830728192e+01

Solve LP by DCP (disciplined convex programming) interface Convex.jl¶

Check Convex.jl documentation for a list of supported operations.

using Convex

# Use Mosek solver
using Mosek
solver = MosekSolver(LOG=1)

# # Use Gurobi solver
# using Gurobi
# solver = GurobiSolver(OutputFlag=1)

# # Use Cplex solver
# using CPLEX
# solver = CplexSolver(CPXPARAM_ScreenOutput=1)

## Use SCS solver
#using SCS
#solver = SCSSolver(verbose=1)
#set_default_solver(solver)

# Set up optimizaiton problem
x = Variable(n)
problem = minimize(norm(x, 1))
problem.constraints += A * x == y

# Solve the problem
@time solve!(problem, solver)

# Display the solution
plot(x=1:n, y=x0, Geom.point)
plot(x=1:n, y=xsol, Geom.line, Guide.title("Reconstructed signal overlayed with x0"))

Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : LO (linear optimization problem)
  Constraints            : 2177            
  Cones                  : 0               
  Scalar variables       : 2049            
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 1024
Eliminator terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 2                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.02            
Lin. dep.  - number                 : 0               
Presolve terminated. Time: 0.04    
Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : LO (linear optimization problem)
  Constraints            : 2177            
  Cones                  : 0               
  Scalar variables       : 2049            
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer  - threads                : 8               
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 1152
Optimizer  - Cones                  : 0
Optimizer  - Scalar variables       : 3073              conic                  : 0               
Optimizer  - Semi-definite variables: 0                 scalarized             : 0               
Factor     - setup time             : 0.03              dense det. time        : 0.00            
Factor     - ML order time          : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1.42e+05          after factor           : 1.42e+05        
Factor     - dense dim.             : 1                 flops                  : 3.59e+07        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   5.1e+00  1.0e+00  1.2e+03  1.00e+00   1.448154688e+03   0.000000000e+00   1.0e+00  0.09  
1   2.0e+00  1.0e+00  7.9e+02  0.00e+00   1.868721000e+02   5.017718138e+00   3.7e+00  0.10  
2   8.9e-01  4.5e-01  3.6e+02  9.38e-01   9.836670682e+01   1.394257795e+01   1.7e+00  0.10  
3   2.8e-01  1.4e-01  1.1e+02  9.69e-01   4.300059794e+01   1.627916054e+01   5.3e-01  0.11  
4   9.9e-02  5.0e-02  4.0e+01  9.85e-01   2.693148355e+01   1.743259784e+01   1.9e-01  0.11  
5   2.0e-02  1.0e-02  8.0e+00  9.94e-01   2.000060021e+01   1.808359996e+01   3.8e-02  0.12  
6   1.6e-03  8.2e-04  6.5e-01  9.98e-01   1.843978822e+01   1.828480266e+01   3.0e-03  0.13  
7   1.2e-05  5.9e-06  4.7e-03  1.00e+00   1.830822487e+01   1.830710759e+01   2.2e-05  0.13  
8   5.9e-08  3.0e-08  2.4e-05  1.00e+00   1.830728673e+01   1.830728103e+01   1.1e-07  0.14  
9   3.5e-08  1.5e-10  1.7e-07  1.00e+00   1.830728195e+01   1.830728191e+01   5.8e-10  0.14  
Basis identification started.
Basis identification terminated. Time: 0.05
Optimizer terminated. Time: 0.20    


Interior-point solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL
  Primal.  obj: 1.8307281953e+01    nrm: 2e+01    Viol.  con: 2e-10    var: 0e+00  
  Dual.    obj: 1.8307281912e+01    nrm: 1e+01    Viol.  con: 3e-16    var: 6e-12  

Basic solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL
  Primal.  obj: 1.8307281917e+01    nrm: 2e+01    Viol.  con: 2e-14    var: 0e+00  
  Dual.    obj: 1.8307281912e+01    nrm: 2e+01    Viol.  con: 4e-16    var: 6e-12  
  0.235640 seconds (95.13 k allocations: 23.659 MiB, 3.46% gc time)

LP example: quantile regression¶

In linear regression, we model the mean of response variable as a function of covariates. In many situations, the error variance is not constant, the distribution of $y$ may be asymmetric, or we simply care about the quantile(s) of response variable. Quantile regression offers a better modeling tool in these applications.

In $\tau$-quantile regression, we minimize the loss function \begin{eqnarray*} f(\beta) = \sum_{i=1}^n \rho_\tau (y_i - \mathbf{x}_i^T \beta), \end{eqnarray*} where $\rho_\tau(z) = z (\tau - 1_{\{z < 0\}})$. Writing $\mathbf{y} - \mathbf{X} \beta = \mathbf{r}^+ - \mathbf{r}^-$, this is equivalent to the LP \begin{eqnarray*} &\text{minimize}& \tau \mathbf{1}^T \mathbf{r}^+ + (1-\tau) \mathbf{1}^T \mathbf{r}^- \\ &\text{subject to}& \mathbf{r}^+ - \mathbf{r}^- = \mathbf{y} - \mathbf{X} \beta \\ & & \mathbf{r}^+ \succeq \mathbf{0}, \mathbf{r}^- \succeq \mathbf{0} \end{eqnarray*} in $\mathbf{r}^+$, $\mathbf{r}^-$, and $\beta$.

LP Example: $\ell_1$ regression¶

A popular method in robust statistics is the median absolute deviation (MAD) regression that minimizes the $\ell_1$ norm of the residual vector $\|\mathbf{y} - \mathbf{X} \beta\|_1$. This apparently is equivalent to the LP \begin{eqnarray*} &\text{minimize}& \mathbf{1}^T (\mathbf{r}^+ + \mathbf{r}^-) \\ &\text{subject to}& \mathbf{r}^+ - \mathbf{r}^- = \mathbf{y} - \mathbf{X} \beta \\ & & \mathbf{r}^+ \succeq \mathbf{0}, \mathbf{r}^- \succeq \mathbf{0} \end{eqnarray*} in $\mathbf{r}^+$, $\mathbf{r}^-$, and $\beta$.

$\ell_1$ regression = MAD = 1/2-quantile regression.

LP Example: $\ell_\infty$ regression (Chebychev approximation)¶

Minimizing the worst possible residual $\|\mathbf{y} - \mathbf{X} \beta\|_\infty$ is equivalent to the LP \begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& -t \le y_i - \mathbf{x}_i^T \beta \le t, \quad i = 1,\dots,n \end{eqnarray*} in variables $\beta$ and $t$.

LP Example: Dantzig selector¶

Candes and Tao (2007) propose a variable selection method called the Dantzig selector that solves \begin{eqnarray*} &\text{minimize}& \|\mathbf{X}^T (\mathbf{y} - \mathbf{X} \beta)\|_\infty \\ &\text{subject to}& \sum_{j=2}^p |\beta_j| \le t, \end{eqnarray*} which can be transformed to an LP. Indeed they name the method after George Dantzig, who invented the simplex method for efficiently solving LP in 50s.

LP Example: 1-norm SVM¶

In two-class classification problems, we are given training data $(\mathbf{x}_i, y_i)$, $i=1,\ldots,n$, where $\mathbf{x}_i \in \mathbb{R}^p$ are feature vectors and $y_i \in \{-1, 1\}$ are class labels. Zhu, Rosset, Tibshirani, and Hastie (2004) propose the 1-norm support vector machine (svm) that achieves the dual purpose of classification and feature selection. Denote the solution of the optimization problem \begin{eqnarray*} &\text{minimize}& \sum_{i=1}^n \left[ 1 - y_i \left( \beta_0 + \sum_{j=1}^p x_{ij} \beta_j \right) \right]_+ \\ &\text{subject to}& \|\beta\|_1 = \sum_{j=1}^p |\beta_j| \le t \end{eqnarray*} by $\hat \beta_0(t)$ and $\hat \beta(t)$. 1-norm svm classifies a future feature vector $\mathbf{x}$ by the sign of fitted model \begin{eqnarray*} \hat f(\mathbf{x}) = \hat \beta_0 + \mathbf{x}^T \hat \beta. \end{eqnarray*}

Many more applications of LP: Airport scheduling (Copenhagen airport uses Gurobi), airline flight scheduling, NFL scheduling, match.com, $\LaTeX$, ...

Apparently any loss/penalty or loss/constraint combinations of form $$ \{\ell_1, \ell_\infty, \text{quantile}\} \times \{\ell_1, \ell_\infty, \text{quantile}\}, $$ possibly with affine (equality and/or inequality) constraints, can be formulated as an LP.