Optimization Examples: Geometric Programming (GP)

  • A function $f: \mathbb{R}^n \mapsto \mathbb{R}$ with $\text{dom} f = \mathbb{R}_{++}^n$ defined as \begin{eqnarray*} f(\mathbf{x}) = c x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}, \end{eqnarray*} where $c>0$ and $a_i \in \mathbb{R}$, is called a monomial.

  • A sum of monomials \begin{eqnarray*} f(\mathbf{x}) = \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}}, \end{eqnarray*} where $c_k > 0$, is called a posynomial.

  • Posynomials are closed under addition, multiplication, and nonnegative scaling.

  • A geometric program is of form \begin{eqnarray*} &\text{minimize}& f_0(\mathbf{x}) \\ &\text{subject to}& f_i(\mathbf{x}) \le 1, \quad i=1,\ldots,m \\ & & h_i(\mathbf{x}) = 1, \quad i=1,\ldots,p \end{eqnarray*} where $f_0, \ldots, f_m$ are posynomials and $h_1, \ldots, h_p$ are monomials. The constraint $\mathbf{x} \succ \mathbf{0}$ is implicit.

    Is GP a convex optimization problem?

  • With change of variable $y_i = \ln x_i$, a monomial \begin{eqnarray*} f(\mathbf{x}) = c x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n} \end{eqnarray*} can be written as \begin{eqnarray*} f(\mathbf{x}) = f(e^{y_1}, \ldots, e^{y_n}) = c (e^{y_1})^{a_1} \cdots (e^{y_n})^{a_n} = e^{\mathbf{a}^T \mathbf{y} + b}, \end{eqnarray*} where $b = \ln c$. Similarly, we can write a posynomial as \begin{eqnarray*} f(\mathbf{x}) &=& \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}} \\ &=& \sum_{k=1}^K e^{\mathbf{a}_k^T \mathbf{y} + b_k}, \end{eqnarray*} where $\mathbf{a}_k = (a_{1k}, \ldots, a_{nk})$ and $b_k = \ln c_k$.

  • The original GP can be expressed in terms of the new variable $\mathbf{y}$ \begin{eqnarray*} &\text{minimize}& \sum_{k=1}^{K_0} e^{\mathbf{a}_{0k}^T \mathbf{y} + b_{0k}} \\ &\text{subject to}& \sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}} \le 1, \quad i = 1,\ldots,m \\ & & e^{\mathbf{g}_i^T \mathbf{y} + h_i} = 1, \quad i=1,\ldots,p, \end{eqnarray*} where $\mathbf{a}_{ik}, \mathbf{g}_i \in \mathbb{R}^n$. Taking log of both objective and constraint functions, we obtain the geometric program in convex form \begin{eqnarray*} &\text{minimize}& \ln \left(\sum_{k=1}^{K_0} e^{\mathbf{a}_{0k}^T \mathbf{y} + b_{0k}}\right) \\ &\text{subject to}& \ln \left(\sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}}\right) \le 0, \quad i = 1,\ldots,m \\ & & \mathbf{g}_i^T \mathbf{y} + h_i = 0, \quad i=1,\ldots,p. \end{eqnarray*}

  • Mosek is capable of solving GP. cvx has a GP mode that recognizes and transforms GP problems.

  • Example. Logistic regression as GP. Given data $(\mathbf{x}_i, y_i)$, $i=1,\ldots,n$, where $y_i \in \{0, 1\}$ and $\mathbf{x}_i \in \mathbb{R}^p$, the likelihood of the logistic regression model is \begin{eqnarray*} & & \prod_{i=1}^n p_i^{y_i} (1 - p_i)^{1 - y_i} \\ &=& \prod_{i=1}^n \left( \frac{e^{\mathbf{x}_i^T \beta}}{1 + e^{\mathbf{x}_i^T \beta}} \right)^{y_i} \left( \frac{1}{1 + e^{\mathbf{x}_i^T \beta}} \right)^{1 - y_i} \\ &=& \prod_{i:y_i=1} e^{\mathbf{x}_i^T \beta y_i} \prod_{i=1}^n \left( \frac{1}{1 + e^{\mathbf{x}_i^T \beta}} \right). \end{eqnarray*} The MLE solves \begin{eqnarray*} &\text{minimize}& \prod_{i:y_i=1} e^{-\mathbf{x}_i^T \beta} \prod_{i=1}^n \left( 1 + e^{\mathbf{x}_i^T \beta} \right). \end{eqnarray*} Let $z_j = e^{\beta_j}$, $j=1,\ldots,p$. The objective becomes \begin{eqnarray*} \prod_{i:y_i=1} \prod_{j=1}^p z_j^{-x_{ij}} \prod_{i=1}^n \left( 1 + \prod_{j=1}^p z_j^{x_{ij}} \right). \end{eqnarray*} This leads to a GP \begin{eqnarray*} &\text{minimize}& \prod_{i:y_i=1} s_i \prod_{i=1}^n t_i \\ &\text{subject to}& \prod_{j=1}^p z_j^{-x_{ij}} \le s_i, \quad i = 1,\ldots,m \\ & & 1 + \prod_{j=1}^p z_j^{x_{ij}} \le t_i, \quad i = 1, \ldots, n, \end{eqnarray*} in variables $\mathbf{s} \in \mathbb{R}^{m}$, $\mathbf{t} \in \mathbb{R}^n$, and $\mathbf{z} \in \mathbb{R}^p$. Here $m$ is the number of observations with $y_i=1$.

    How to incorporate lasso penalty? Let $z_j^+ = e^{\beta_j^+}$, $z_j^- = e^{\beta_j^-}$. Lasso penalty takes the form $e^{\lambda |\beta_j|} = (z_j^+ z_j^-)^\lambda$.

  • Example. Bradley-Terry model for sports ranking. See ST758 HW8 http://hua-zhou.github.io/teaching/st758-2014fall/ST758-2014-HW8.pdf. The likelihood is \begin{eqnarray*} \prod_{i, j} \left( \frac{\gamma_i}{\gamma_i + \gamma_j} \right)^{y_{ij}}. \end{eqnarray*} MLE is solved by GP \begin{eqnarray*} &\text{minimize}& \prod_{i,j} t_{ij}^{y_{ij}} \\ &\text{subject to}& 1 + \gamma_i^{-1} \gamma_j \le t_{ij} \end{eqnarray*} in $\gamma \in \mathbb{R}^n$ and $\mathbf{t} \in \mathbb{R}^{n^2}$.