This page lists some potential course project ideas. Of course you are encouraged to come up your own projects. Please talk to me about your potential ideas.
Julia package for sparse regression.
Sparse and regularized regression is routinely used in daily statistical research and applications. Performance and flexibility of the Julia language may promise a good implementation that scales to large or even distributed data sets. Can you beat the popular glmnet
package in R?
Julia package for projection and proximal operators.
Projection and proximal operators play central role in many algorithms in modern statistics and machine learning. See, e.g., the survey papers by Stephen Boyd and others on ADMM (alternating direction method of multipliers) and proximal algorithms. A collection of most often used projection and proximal operators with quality implementation in Julia will be valuable. A good example is the TFOCS (templates for first-order conic solvers) for Matlab.
Julia package for tensor computation.
In 1802, Carl F. Gauss wrote an essay Summarische Übersicht der Bestimmung der Bahnen der beiden neuen Hauptplaneten angewandten Methoden to explain the novel least squares method he invented that led to the discovery of the dwarf planet Ceres. Many consider this essay as the origin of modern linear algebra. Since then linear algebra has evolved into an essential subject in pure and applied math, physics, engineering, computer science, and (of course) statistics. Central subjects in linear algebra are vectors and matrices. One generalization of vectors/matrices is multidimensional arrays, also known as tensors, which has attracted intensive attention in recent years. All questions one asks about matrices also apply to tensors and themselves generate numerous new questions. Sections 12.4 and 12.5 of new edition of Matrix Computations (Golub and Van Loan, 2012) and the survey paper Tensor Decompositions and Applications (Kolda and Bader, 2009) are excellent introduction to tensors.
Quality numerical linear algebra libraries such as BLAS and LAPACK underlie most scientific computation. A similar library that encapsulates essential tensor computations is in great demand. The Matlab Tensor Toolbox developed at the Sandia National Laboratories represents such efforts and is being heavily used in many fields. The new language Julia offers many promising features for technical computing (high-performance JIT compiler, low level memory management, flexible typing, distributed computing, etc). Currently there seems no serious development for tensor computation in Julia.
Ranking sports teams with covariates
In HW6 Q4, we worked on the Bradley-Terry and a normal model for ranking teams based on win-loss data. Explore how to model the team’s ability $a_i$ by a regression model $a_i = x_i^T \beta$, where the vector $x_i$ contains features of team $i$ and $\beta$ are the regression coefficients. If possible, formulate the models as convex problems and try on some real data.
Optimal design for GLM or nonlinear models
In HW6 Q1, we worked on some optimal designs for linear model. Finding the optimal design for generalized linear models (GLM) or nonlinear models is still a challenging problem. The difficulty lies in the fact that the Fisher information matrix $I(\beta) = \sum_i w_i(\beta) x_i x_i^T$ depends on the regression coefficients, which are unknown. One approach is to find the minimax optimal design. That is to minimize $\max_\beta \text{det} (\sum_i p_i w_i(\beta) x_i x_i^T)^{-1}$ subject to the simplex constraint on $p=(p_1, \ldots, p_m)$. Investigate whether the convex optimization tools have any bearing on solving this problem. I would start by (i) read a couple of recent papers, e.g., paper 1 and paper 2, to understand the problem and current status-of-art, (ii) read cvx
’s capability of defining partially specified problems to see whether that’s useful, (iii) try a general nonlinear optimization software, e.g., the NLopt.jl
package in Julia
, (iv) read the paper and ponder the MM approach for minimax optimal design problem.
Boosting PageRank by LP
In ST758 HW4 (2014 fall), we worked on the numerical algorithms for computing Google’s PageRank based on the webpage link structure. We may also ask the following question. Given the link structure, how to put links on my page to boost its PageRank score? Formulate this as a linear programming problem, try on the stat-ncsu data set, and report the results.