Due Friday, May 24 @ 11:59PM
Read in the longley.txt
with the response (number of people employed
) in the first column and six explanatory variables in the other columns (GNP implicit price deflator
, Gross National Product
, number of unemployed
, number of people in the armed forces
, noninstitutionalized population >=14 years of age
, year
). Include an intercept in your model.
One popular regularization method is the ridge regression, which estimates regression coefficients by minimizing a penalized least squares criterion \begin{eqnarray*} \frac 12 \| \mathbf{y} - \mathbf{X} \beta\|_2^2 + \frac{\lambda}{2} \|\beta\|_2^2. \end{eqnarray*} Show that, regardless the shape of $\mathbf{X}$, there is always a unique global minimum for any $\lambda>0$ and the ridge solution is given by \begin{eqnarray*} \widehat \beta(\lambda) = (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^T \mathbf{y}. \end{eqnarray*}
Show that the ridge estimator is equivalent to the solution of a regular least squares problem with added observations. Compute the ridge regression estimates $\widehat \beta(\lambda)$ at $\lambda=5,10,15,...,100$ by solving this augmented least squares problem and report the timing. You can use any method of your choice (QR, Cholesky, or sweep).
Express ridge solution $\widehat \beta(\lambda)$ in terms of the singular value decomposition (SVD) of $\mathbf{X}$.
Show that (i) the $\ell_2$ norms of ridge solution $\|\widehat \beta(\lambda)\|_2$ and corresponding fitted values $\|\widehat{\mathbf{y}} (\lambda)\|_2 = \|\mathbf{X} \widehat \beta (\lambda)\|_2$ are non-increasing in $\lambda$ and (ii) the $\ell_2$ norm of the residual vector $\|\mathbf{y} - \widehat{\mathbf{y}}(\lambda)\|_2$ is non-decreasing in $\lambda$.
Re-compute and plot the ridge solution for the Longley data in Q1(3) at $\lambda = 5, 10, 15,\ldots, 100$ using the SVD approach and report the timing. Comment on the computation efficiency of SVD approach compared to the approach you used in Q1(3).
Let's address how to choose the optimal tuning parameter $\lambda$. Let $\widehat{\beta}_k(\lambda)$ be the solution to the ridge problem \begin{eqnarray*} \text{minimize} \,\, \frac 12 \| \mathbf{y}_{-k} - \mathbf{X}_{-k} \beta\|_2^2 + \frac{\lambda}{2} \|\beta\|_2^2, \end{eqnarray*} where $\mathbf{y}_{-k}$ and $\mathbf{X}_{-k}$ are the data with the $k$-th observation taken out. The optimal $\lambda$ can to chosen to minimize the cross-validation square error \begin{eqnarray*} C(\lambda) = \frac 1n \sum_{k=1}^n [y_k - \mathbf{x}_k^T \widehat{\beta}_k(\lambda)]^2. \end{eqnarray*} However computing $n$ ridge solution paths $\widehat{\beta}_k(\lambda)$ is expensive. Show that, using SVD $\mathbf{X}=\mathbf{U} \Sigma \mathbf{V}^T$, \begin{eqnarray*} C(\lambda) = \frac 1n \sum_{k=1}^n \left[ \frac{y_k - \sum_{j=1}^r u_{kj} \tilde y_j \left( \frac{\sigma_j^2}{\sigma_j^2 + \lambda} \right)}{1 - \sum_{j=1}^r u_{kj}^2 \left( \frac{\sigma_j^2}{\sigma_j^2 + \lambda} \right)} \right]^2, \end{eqnarray*} where $\tilde{\mathbf{y}} = \mathbf{U}^T \mathbf{y}$.
Find optimal $\lambda$ for the Longley data.
Below figure displays two 3s written on a piece of paper and we want to properly align them.
Let matrices $\mathbf{X}, \mathbf{Y} \in \mathbb{R}^{n \times p}$ record $n$ points on the two shapes. In this case $n=10$ and $p=2$. Mathematically we consider the problem \begin{eqnarray*} \text{minimize}_{\beta, \mathbf{O}, \mu} \quad \|\mathbf{X} - (\beta \mathbf{Y} \mathbf{O} + \mathbf{1}_n \mu^T)\|_{\text{F}}^2, \end{eqnarray*} where $\beta > 0$ is a scaling factor, $\mathbf{O} \in \mathbb{R}^{p \times p}$ is an orthogonal matrix, and $\mu \in \mathbb{R}^{p}$ is a vector of shifts. Here $\|\mathbf{M}\|_{\text{F}}^2 = \sum_{i,j} m_{ij}^2$ is the squared Frobenius norm. Intuitively we want to rotate, stretch and shift the shape $\mathbf{Y}$ to match the shape $\mathbf{X}$ as much as possible.
Let $\bar{\mathbf{x}}$ and $\bar{\mathbf{y}}$ be the column mean vectors of the matrices and $\tilde{\mathbf{X}}$ and $\tilde{\mathbf{Y}}$ be the versions of these matrices centered by column means. Show that the solution $(\hat{\beta}, \hat{\mathbf{O}}, \hat{\mu})$ satisfies \begin{eqnarray*} \hat{\mu} = \bar{\mathbf{x}} - \hat{\beta} \cdot \hat{\mathbf{O}}^T \bar{\mathbf{y}}. \end{eqnarray*} Therefore we can center each matrix at its column centroid and then ignore the location completely.
Derive the solution to \begin{eqnarray*} \text{minimize}_{\beta, \mathbf{O}} \quad \|\tilde{\mathbf{X}} - \beta \tilde{\mathbf{Y}} \mathbf{O}\|_{\text{F}}^2 \end{eqnarray*} using the SVD of $\tilde{\mathbf{Y}}^T \tilde{\mathbf{X}}$.
Implement your method and solve the alignment problem in the figure. Display $\mathbf{X}$ and your solution $\hat{\beta} \mathbf{Y} \hat{\mathbf{O}} + \mathbf{1}_n \hat{\mu}^T$ together.