Negative multinomial regression and sparse regression
A demo of negative multinomial regression and sparse regression
Contents
- Generate negative multinomial random vectors from covariates
- Fit negative multinomial regression - link over-disperson parameter
- Fit negative multinomial regression - not linking over-disperson parameter
- Fit negative multinomial sparse regression - lasso/group/nuclear penalty
- Sparse regression (not linking over-disp.) - lasso/group/nuclear penalty
Generate negative multinomial random vectors from covariates
clear; % reset random seed s = RandStream('mt19937ar','Seed',1); RandStream.setGlobalStream(s); % sample size n = 200; % # covariates p = 15; % # bins d = 5; % design matrix X = randn(n,p); % true regression coefficients B = zeros(p,d+1); nzidx = [1 3 5]; B(nzidx,:) = ones(length(nzidx),d+1); eta = X*B; alpha = exp(eta); prob(:,d+1) = 1./(sum(alpha(:,1:d),2)+1); prob(:,1:d) = bsxfun(@times, alpha(:,1:d), prob(:,d+1)); b= binornd(10,0.2, n, 1); Y = negmnrnd(prob,b); zerorows = sum(Y,2); Y=Y(zerorows~=0, :); X=X(zerorows~=0, :);
Fit negative multinomial regression - link over-disperson parameter
tic; [B_hat, stats] = negmnreg(X,Y); toc; display(B_hat); display(stats); % Wald test of predictor significance display('Wald test p-values:'); display(stats.wald_pvalue); figure; plot(stats.logL_iter); xlabel('iteration'); ylabel('log-likelihood');
Elapsed time is 0.225574 seconds.
B_hat =
1.0559 1.0459 1.0098 1.0084 0.9932 -0.1119
0.0278 -0.0052 0.0070 0.0158 0.0092 0.1827
0.8343 0.8947 0.9116 0.8642 0.8333 0.2859
0.0411 -0.0185 0.0076 -0.0617 -0.0201 0.0264
0.9650 1.0042 1.0083 1.0122 1.0710 -0.1664
-0.1669 -0.1084 -0.0812 -0.1458 -0.0916 0.0577
-0.1095 -0.0959 -0.0644 -0.1221 -0.0612 0.3709
0.1659 0.1893 0.1286 0.1494 0.0899 -0.2356
-0.0414 0.0081 -0.1278 -0.0591 -0.0835 0.2748
-0.0397 -0.1438 -0.0578 -0.0818 -0.1170 -0.0070
0.2200 0.3404 0.2959 0.2698 0.2547 -0.5784
0.1163 0.1595 0.1778 0.0901 0.1181 -0.1753
-0.1064 -0.0661 -0.1113 -0.0682 -0.0980 0.0240
0.0613 0.0525 0.0720 0.0194 0.0251 -0.0206
-0.0697 -0.0377 0.0490 0.0055 -0.0188 0.1182
stats =
struct with fields:
BIC: 3.5447e+03
dof: 90
iterations: 15
logL: -1.5451e+03
logL_iter: [1×15 double]
se: [15×6 double]
wald_stat: [1×15 double]
wald_pvalue: [1×15 double]
H: [90×90 double]
gradient: [90×1 double]
observed_information: [90×90 double]
Wald test p-values:
Columns 1 through 7
0 0.6526 0 0.6493 0 0.4724 0.0606
Columns 8 through 14
0.2285 0.0125 0.4257 0.0038 0.4790 0.8500 0.9286
Column 15
0.3161
Fit negative multinomial regression - not linking over-disperson parameter
tic; [B_hat, b_hat, stats] = negmnreg2(X,Y); toc; disp(B_hat); disp(stats.se_B); disp(b_hat); disp(stats.se_b); display(stats); % Wald test of predictor significance display('Wald test p-values:'); display(stats.wald_pvalue); figure; plot(stats.logL_iter); xlabel('iteration'); ylabel('log-likelihood');
Elapsed time is 0.137911 seconds.
0.8703 0.8610 0.8284 0.8264 0.8117
0.1526 0.1215 0.1320 0.1414 0.1345
0.9052 0.9645 0.9824 0.9348 0.9053
0.0244 -0.0311 -0.0085 -0.0746 -0.0338
0.7500 0.7874 0.7897 0.7939 0.8518
-0.1307 -0.0742 -0.0470 -0.1106 -0.0575
0.0262 0.0408 0.0669 0.0126 0.0717
0.0421 0.0648 0.0063 0.0266 -0.0318
0.0491 0.0947 -0.0345 0.0310 0.0079
-0.0425 -0.1418 -0.0618 -0.0832 -0.1186
-0.0724 0.0411 0.0022 -0.0252 -0.0394
0.0828 0.1263 0.1437 0.0583 0.0849
-0.1464 -0.1080 -0.1518 -0.1103 -0.1390
0.0242 0.0186 0.0351 -0.0156 -0.0100
-0.0008 0.0313 0.1110 0.0707 0.0458
0.0760 0.0756 0.0756 0.0758 0.0758
0.0757 0.0754 0.0755 0.0758 0.0757
0.0672 0.0667 0.0666 0.0671 0.0671
0.0768 0.0766 0.0765 0.0767 0.0766
0.0765 0.0764 0.0763 0.0766 0.0763
0.0660 0.0655 0.0660 0.0662 0.0660
0.0699 0.0694 0.0694 0.0698 0.0697
0.0617 0.0612 0.0614 0.0618 0.0614
0.0712 0.0708 0.0709 0.0712 0.0709
0.0745 0.0739 0.0738 0.0744 0.0742
0.0752 0.0753 0.0750 0.0754 0.0755
0.0679 0.0674 0.0673 0.0673 0.0674
0.0722 0.0718 0.0720 0.0722 0.0722
0.0671 0.0666 0.0670 0.0670 0.0670
0.0692 0.0690 0.0687 0.0691 0.0688
2.2915
0.1355
stats =
struct with fields:
BIC: 3.3630e+03
iterations: 8
logL: -1.4921e+03
logL_iter: [1×8 double]
se: [15×6 double]
wald_stat: [1×15 double]
wald_pvalue: [1×15 double]
H: [76×76 double]
gradient: [76×1 double]
observed_information: [76×76 double]
se_B: [15×5 double]
se_b: 0.1355
Wald test p-values:
Columns 1 through 7
0 0.4959 0 0.5382 0 0.3026 0.7689
Columns 8 through 14
0.2618 0.1964 0.2896 0.4888 0.1913 0.3636 0.8945
Column 15
0.3568
Fit negative multinomial sparse regression - lasso/group/nuclear penalty
Regression on the over dispersion parameter
penalty = {'sweep','group','nuclear'};
ngridpt = 30;
dist = 'negmn';
for i = 1:length(penalty)
pen = penalty{i};
[~, stats] = mglm_sparsereg(X,Y,inf,'penalty',pen,'dist',dist);
maxlambda = stats.maxlambda;
lambdas = exp(linspace(log(maxlambda),log(maxlambda/100),ngridpt));
BICs = zeros(1,ngridpt);
logl =zeros(1, ngridpt);
dofs = zeros(1, ngridpt);
tic;
for j=1:ngridpt
if j==1
B0 = zeros(p,d+1);
else
B0 = B_hat;
end
[B_hat, stats] = mglm_sparsereg(X,Y,lambdas(j),'penalty',pen, ...
'dist',dist,'B0',B0);
BICs(j) = stats.BIC;
logl(j) = stats.logL;
dofs(j) = stats.dof;
end
toc;
% True signal versus estimated signal
[bestbic,bestidx] = min(BICs);
[B_best,stats] = mglm_sparsereg(X,Y,lambdas(bestidx),'penalty',pen,'dist',dist);
figure;
subplot(1,3,1);
semilogx(lambdas,BICs);
ylabel('BIC');
xlabel('\lambda');
xlim([min(lambdas) max(lambdas)]);
subplot(1,3,2);
imshow(mat2gray(-B)); title('True B');
subplot(1,3,3);
imshow(mat2gray(-B_best)); title([pen ' estimate']);
end
Elapsed time is 1.808871 seconds. Elapsed time is 1.867126 seconds. Elapsed time is 2.656329 seconds.
Sparse regression (not linking over-disp.) - lasso/group/nuclear penalty
Do not run regression on the over dispersion parameter
penalty = {'sweep','group','nuclear'};
ngridpt = 30;
dist = 'negmn2';
for i = 1:length(penalty)
pen = penalty{i};
[~, stats] = mglm_sparsereg(X,Y,inf,'penalty',pen,'dist',dist);
maxlambda = stats.maxlambda;
lambdas = exp(linspace(log(maxlambda),log(maxlambda/100),ngridpt));
BICs = zeros(1,ngridpt);
logl =zeros(1, ngridpt);
dofs = zeros(1, ngridpt);
tic;
for j=1:ngridpt
if j==1
B0 = zeros(p,d);
else
B0 = B_hat;
end
[B_hat, stats] = mglm_sparsereg(X,Y,lambdas(j),'penalty',pen, ...
'dist',dist,'B0',B0);
BICs(j) = stats.BIC;
logl(j) = stats.logL;
dofs(j) = stats.dof;
end
toc;
% True signal versus estimated signal
[bestbic,bestidx] = min(BICs);
B_best = mglm_sparsereg(X,Y,lambdas(bestidx),'penalty',pen,'dist',dist);
figure;
subplot(1,3,1);
semilogx(lambdas,BICs);
ylabel('BIC');
xlabel('\lambda');
xlim([min(lambdas) max(lambdas)]);
subplot(1,3,2);
imshow(mat2gray(-B)); title('True B');
subplot(1,3,3);
imshow(mat2gray(-B_best)); title([pen ' estimate']);
end
Elapsed time is 4.949842 seconds. Elapsed time is 4.714719 seconds. Elapsed time is 5.183260 seconds.