Sparse GEE
A demonstration of sparse regression using generalized estimating equations (GEE). Sparsity is in the general sense: variable selection, total variation regularization, polynomial trend filtering, and others. Various penalties are implemenmted: elestic net (enet), power family (bridge regression), log penalty, SCAD, and MCP.
Contents
Sparse linear regression (n>p)
Simulate a sample data set (n=500, p=100), with equicorrelation structure within clusters
clear; s = RandStream('mt19937ar','Seed',123); RandStream.setGlobalStream(s); n = 500; p = 100; X = randn(n,p); % generate a random design matrix X = bsxfun(@rdivide, X, sqrt(sum(X.^2,1))); % normalize predictors X = [ones(size(X,1),1) X]; % add intercept b = zeros(p+1,1); % true signal: b(2:6) = 3; % first 5 predictors are 3 b(7:11) = -3; % next 5 predictors are -3 % set up cluster structure clusterSize = 5; nCluster = n/clusterSize; id = kron((1:nCluster)', ones(clusterSize,1)); time = kron(ones(nCluster,1), (1:clusterSize)'); % equi-correlation structure alpha = 0.5; Vi = repmat(alpha, clusterSize, clusterSize); % equi-correlation struct. Vi(1:size(Vi,1)+1:numel(Vi)) = 1; V = kron(eye(nCluster), Vi); % simulate responses y = X*b+chol(V)'*randn(n,1);
Lasso sparse GEE at fixed tuning parameter values, using the correct correlation structure (equicorr)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat,stats] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(stats); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a smaller tuning parameter value lambda = 3; [betahat,alphahat,stats] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(stats); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
stats = struct with fields: iterations: 2 alphahat = 0.3269 stats = struct with fields: iterations: 5 alphahat = 0.3675
SCAD sparse GEE at fixed tuning parameter values, using the correct correlation structure (equicorr)
penalty = 'scad'; % set penalty function to SCAD penparam = 3; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat] ... = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); % plot penalized estimate figure; bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 0.3269 alphahat = 0.3675
Power penalty GEE at fixed tuning parameter values, using the correct correlation structure (equicorr)
penalty = 'power'; % set penalty function to Power penparam = 0.5; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat] ... = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); % plot penalized estimate figure; bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 0.3395 alphahat = 0.3956
Lasso GEE at fixed tuning parameter values, using a wrong correlation structure (AR1)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','AR1',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','AR1',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 0.2708 alphahat = 0.3287
Lasso GEE at fixed tuning parameter values, using a wrong correlation structure (indep)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','indep',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','indep',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = [] alphahat = []
Lasso GEE at fixed tuning parameter values, using a wrong correlation structure (tridiag)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','tridiag',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','tridiag',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 0.2709 alphahat = 0.3478
Lasso GEE at fixed tuning parameter values, using a wrong correlation structure (unstructured)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 6; [betahat,alphahat,stats] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','unstructured',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); display(stats); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 3; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','unstructured',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 1.0521 0.3937 0.3305 0.1450 0.5019 0.3937 0.9893 0.3839 0.2915 0.4713 0.3305 0.3839 0.9716 0.1788 0.4819 0.1450 0.2915 0.1788 0.7457 0.1248 0.5019 0.4713 0.4819 0.1248 1.2817 stats = struct with fields: iterations: 3 alphahat = 1.1360 0.4668 0.3917 0.1866 0.5937 0.4668 1.0841 0.4666 0.3679 0.5243 0.3917 0.4666 1.0255 0.2368 0.5626 0.1866 0.3679 0.2368 0.8075 0.1959 0.5937 0.5243 0.5626 0.1959 1.3913
Sparse linear regression (n<p)
Simulate a sample data set (n=500, p=100), with equicorrelation structure within clusters
clear; s = RandStream('mt19937ar','Seed',123); RandStream.setGlobalStream(s); n = 100; p = 1000; X = randn(n,p); % generate a random design matrix X = bsxfun(@rdivide, X, sqrt(sum(X.^2,1))); % normalize predictors X = [ones(size(X,1),1) X]; % add intercept b = zeros(p+1,1); % true signal: b(2:6) = 3; % first 5 predictors are 3 b(7:11) = -3; % next 5 predictors are -3 % set up cluster structure clusterSize = 5; nCluster = n/clusterSize; id = kron((1:nCluster)', ones(clusterSize,1)); time = kron(ones(nCluster,1), (1:clusterSize)'); % equi-correlation structure alpha = 0.5; Vi = repmat(alpha, clusterSize, clusterSize); % equi-correlation struct. Vi(1:size(Vi,1)+1:numel(Vi)) = 1; V = kron(eye(nCluster), Vi); % simulate responses y = X*b+chol(V)'*randn(n,1);
Lasso sparse GEE at fixed tuning parameter values, using the correct correlation structure (equicorr)
penalty = 'enet'; % set penalty function to lasso penparam = 1; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 10; [betahat,alphahat,stats] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(stats); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a smaller tuning parameter value lambda = 5; [betahat,alphahat,stats] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(stats); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
stats = struct with fields: iterations: 2 alphahat = 0.2892 stats = struct with fields: iterations: 6 alphahat = 0.2886
Power sparse GEE at fixed tuning parameter values, using the correct correlation structure (equicorr)
penalty = 'power'; % set penalty function to SCAD penparam = 0.5; penidx = ... % leave intercept unpenalized [false; true(size(X,2)-1,1)]; % a large tuning parameter value lambda = 10; [betahat,alphahat] ... = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); % plot penalized estimate figure; bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]); % a small tuning parameter value lambda = 5; [betahat,alphahat] ... % lasso GEE = gee_sparsereg(id,time,X,y,'normal','equicorr',lambda, ... 'penidx',penidx,'penalty',penalty,'penparam',penparam); display(alphahat); figure; % plot penalized estimate bar(0:length(betahat)-1,betahat); xlabel('j'); ylabel('\beta_j'); xlim([-1,length(betahat)]); title([penalty '(' num2str(penparam) '), \lambda=' num2str(lambda,2)]);
alphahat = 0.2892 alphahat = 0.4849